Data management in the AI era: key insights for companies

TL;DR: AI and new regulations are making data centralization obsolete. Companies need decentralized architectures to govern, protect, and move data without legal risks or excessive costs.

What Happened?

For years, companies built their data strategies assuming data could flow freely to a central repository where it was refined with AI. However, that world has changed. The combination of three factors—the massive data volumes demanded by AI, growing governance and sovereignty regulations, and high egress costs—is forcing organizations to rethink their data architecture.

As TechRadar notes, the 'oil' metaphor is no longer sufficient: centralization has become a critical bottleneck, similar to relying on a single point of passage in global supply chains. The COVID-19 pandemic already showed how disruption in the flow of essential resources can paralyze economies; analogously, concentrating data in one place exposes companies to availability, compliance, and cost risks. According to a 2024 Gartner report, 60% of organizations that centralize data face scalability and regulatory compliance issues, compared to 30% of those using distributed architectures.

Why It Matters

AI not only consumes huge volumes of data to train models but also requires constant updates and inference execution, multiplying the amount of data generated and moved. At the same time, regulations like Europe's GDPR and the EU AI Act (in effect since August 2025) impose strict restrictions on data residency and potential leakage into language models. In the US, a patchwork of federal and state laws, such as California's CCPA and Virginia's new Data Privacy Act, adds further complexity. The EU AI Act, in particular, classifies AI systems by risk and requires high-risk models—such as those used in hiring or credit—to meet transparency, human oversight, and non-discrimination requirements. Non-compliance fines can reach 7% of global annual revenue, according to Article 99 of the law.

Ignoring this new landscape is not an option: executives cannot afford to drop out of the AI race, but they also cannot afford legal risks or runaway costs. A 2024 McKinsey study estimates that companies that fail to adapt to these regulations could face compliance costs up to 40% higher than those taking a proactive approach.

Consequences for Companies and Users

Companies that do not adapt will face fines for regulatory non-compliance, loss of customer trust, and unsustainable operational costs. For example, in 2023, a multinational tech company was fined €1.2 billion for violating GDPR by transferring data to the US without adequate safeguards. Users, on the other hand, will see greater protection of their data, but also potential delays in the adoption of AI services if companies fail to balance innovation and compliance. Additionally, data fragmentation across jurisdictions may limit the ability to train global models, giving an advantage to those implementing federated or edge computing architectures. According to a 2024 IDC report, 45% of companies are already adopting decentralized data strategies to overcome these barriers, and this figure is expected to reach 70% by 2027.

What Readers Should Know

Decentralized architectures: Instead of moving all data to a central lake, companies should consider approaches like data mesh or data fabric, which allow governing data at its source and moving only what is necessary. Data mesh, popularized by Zhamak Dehghani in 2019, proposes that business teams own and manage their data as products, while data fabric integrates data from multiple sources through virtualization and orchestration. Companies like Netflix and PayPal have already implemented data mesh with positive results in agility and compliance.
Egress costs: Moving data between clouds or from on-premises to the cloud can skyrocket costs. Companies should negotiate contracts and evaluate multicloud storage solutions with zero egress policies. For example, Google Cloud announced in 2024 that it would eliminate egress fees for customers migrating their data to other providers, pressuring AWS and Azure to follow suit. A 2024 CloudZero report revealed that egress fees can account for up to 30% of a company's total cloud bill.
Integrated governance: Access, privacy, and sovereignty policies must be applied automatically, not as an afterthought. Data catalog and lineage tools help track flow. Solutions like Collibra, Alation, or Apache Atlas enable automated policy enforcement and data usage auditing, reducing compliance risk.
Preparation for the EU AI Act: Companies operating in Europe must ensure their AI models do not leak personal data and meet transparency and human oversight requirements. This involves implementing techniques like differential privacy, federated learning, or robust anonymization. A notable case is OpenAI, which in 2023 had to adjust ChatGPT to comply with GDPR, limiting data retention and offering opt-out options.
Investment in talent: Experts in data governance, regulatory compliance, and distributed architectures are needed, profiles that are currently scarce. According to LinkedIn, demand for data architects with governance experience grew 45% in 2024, while salaries for these roles increased 20% year-over-year.

"Data centralization was efficient when moving it was cheap and risk-free. Today, it's a liability." — TechRadar

In summary, the AI era demands a profound rethinking of data management. Companies that act now will be better positioned to innovate without compromising compliance or profitability. As Peter Drucker once said, "what gets measured gets managed"; in this case, what gets decentralized gets governed better.

Why Companies Must Redesign Their Data Management in the AI Era

What Happened?

Why It Matters

Consequences for Companies and Users

What Readers Should Know

Keep reading