Opinion & Analysis
Why organizations building AI on unresolved data problems are scaling their biggest risk — and what it actually takes to flip the ratio
Written by: Justin Windle | Senior Data Executive, 20+ Years Across Major U.S. Financial Institutions
Updated 6:59 AM EDT, July 1, 2026

The most dangerous data problem isn’t the one you know about. It’s the one your organization is confident doesn’t exist.
Consider two scenes that play out in organizations every day:
Scene one: A leadership team reviews the monthly dashboard. Revenue looks strong. Customer metrics are trending well. The model outputs look reasonable. Confidence is high. Three weeks later, a regulator arrives and begins asking questions about the underlying data: the populations, the lineage, the exceptions, the controls.
The answers aren’t as clean as the dashboard suggested. What looked like a well-governed data environment turns out to be a carefully curated reporting layer sitting on top of something messier than anyone realized.
Scene two: A CFO asks two different teams for the same revenue figure. The numbers come back different. Not dramatically, but different enough that nobody can present either with confidence.
Two weeks of meetings follow. Analysts pulling source data, tracing discrepancies, negotiating definitions. Eventually a number is agreed upon, not because it’s definitively correct, but because the deadline arrived.
The cycle restarts next month. Same organization. Same underlying problem. Different consequences.
I’ve sat in both rooms, many times, across five major financial institutions over 20 years. What I’ve observed is consistent enough to be called a pattern: most organizations believe their data is fine. And they will continue believing it until something external forces a different conclusion.
The arrival of AI at enterprise scale has changed what that moment of reckoning looks like. And it has made the cost of deferring this problem significantly higher than it has ever been.
Most organizations have quietly inverted the purpose of their data function. What I’ve come to call the 80/20 flip describes the condition where 80% of a data team’s time, headcount, and energy goes toward making data usable: reconciling, cleansing, normalizing, debating definitions, resolving conflicts between systems.
Only 20% goes toward using data to make better decisions.
This isn’t laziness or incompetence. It’s the natural outcome of organizations that built data systems to support individual functions rather than enterprise truth. Every team optimized their data for their own purposes:
All three are internally logical. None of them agree with each other when a question crosses functional boundaries.
The result is organizations where some of the most talented analytical minds spend the majority of their time as highly educated data janitors: cleaning, reconciling, and validating rather than analyzing, modeling, and advising.
The work is real and necessary. However, it is compensating for a problem that could be solved once rather than managed indefinitely.
What makes this particularly difficult to address is that the people doing the reconciling get good at it. The process becomes institutionalized. It gets factored into timelines. It becomes a feature of how the organization operates, rather than a symptom of something broken.
By the time leadership notices the pattern, it has been running for years.
Internal reporting systems are optimized for internal consumption. The path from raw data to executive dashboard involves dozens of quiet decisions:
Each decision is reasonable in isolation. Collectively, they create a picture that is coherent, consistent, and confidently wrong in ways the organization cannot see from the inside.
The moment external scrutiny arrives, for example, a regulator, an auditor, an acquirer doing due diligence, a board asking questions the dashboard wasn’t designed to answer, the gap between internal confidence and external reality becomes visible. And expensive.
Regulatory findings, customer remediation requirements, earnings restatements, and failed transactions are often not data fraud. They are the accumulated cost of never establishing what the data actually says versus what the reporting system was built to say.
The people who built those systems made reasonable choices. The organization simply never established a mechanism for questioning them.
This is the false confidence problem. And it is almost universally present in organizations that haven’t deliberately solved for enterprise data truth. The leaders who are most certain their data environment is sound are frequently the ones for whom the gap between confidence and reality is largest. This is because their reporting infrastructure is working exactly as designed.
These are observable, recognizable, and nearly universal in organizations that haven’t established authoritative data truth. They don’t require a data audit to identify. Instead, they show up in how people talk, how meetings run, and where time goes.
If your organization has recurring meetings whose primary purpose is resolving why two reports produced different numbers from the same underlying reality, you are paying people to compensate for a data problem that could be solved once rather than managed forever.
The sophistication of the people in that meeting is not the issue. The existence of the meeting is.
When senior leaders regularly qualify data-based answers with definitional caveats, “that number is X if you use the finance definition, but Y if you use the operations definition,” the organization has silently agreed that there is no authoritative answer, only negotiated approximations.
Every decision built on those approximations inherits the uncertainty.
If the majority of your data team’s capacity is allocated to making data usable rather than using it, and if this has been true for more than one planning cycle, the organization is funding a symptom rather than solving the problem.
The 80% will continue consuming resources indefinitely because the condition generating it has never been addressed at its source.
Here is where the conversation becomes urgent rather than just important.
Every organization currently under pressure to show AI progress is operating against a data foundation built before AI was part of the equation. The reconciliation problem that consumed two weeks of analyst time last month is now the training data for a model that will make thousands of automated decisions per day.
The false confidence that made the regulatory finding possible now extends to every output the model generates: faster, at greater scale, and with less visibility into where the errors originated.
AI doesn’t resolve data problems. It operationalizes them.
AI presents outputs with a fluency and confidence that obscures the quality of the inputs. A dashboard built on bad data looks uncertain — numbers don’t quite reconcile, footnotes qualify the figures, analysts caveat the conclusions. An AI model built on bad training data produces clean, coherent, authoritative-sounding outputs. The signal of data quality problems disappears precisely when the stakes get higher.
The population exclusions and definitional choices that created your reconciliation problem don’t disappear when you train a model — they become embedded in the model’s understanding of reality. A customer segment that was systematically handled differently in your source data because of a definition someone made five years ago will be handled differently by your AI forever, invisibly, at scale.
If your organization already carries data quality issues that a regulator could find on a deep dive — and most do — AI deployment doesn’t reduce that exposure. It extends it across every decision the model touches. The improper customer handling that affected a segment of accounts now potentially affects every account the model processes.
Traditional reporting, for all its flaws, produces a traceable path from source data to output. As AI models become more complex, that traceability decreases. When a regulator asks why a model treated a population of customers a certain way, ‘the model learned it from the training data’ is not an acceptable answer. But it may be the only honest one available if the data foundation was never properly governed.
Flipping the ratio — moving from 80% of effort on data management to 80% on data value — is not a technology project. Organizations that have attempted it by purchasing a new platform, implementing a new governance tool, or hiring a larger data team without changing the underlying conditions have consistently found that the ratio returns to its original state within 18 months.
Not a committee. Not a framework document. A set of explicit, governed decisions about what the data means, who owns it, and which version is correct when systems conflict. This requires organizational authority, someone with the standing to make those decisions stick across functions that have historically operated autonomously.
Frameworks describe the destination. Authority is what gets you there.
Most data quality problems originate far upstream from where they are discovered. A customer record entered incorrectly at onboarding creates reconciliation problems three systems and six months later.
Fixing the downstream symptom is perpetual. Fixing the upstream cause is a decision about process, accountability, and what the organization is willing to hold people responsible for.
The reconciliation meeting is not an IT problem. The regulatory finding is not a compliance problem. They are business problems with data symptoms. Organizations that assign them to technical teams to solve, without business leadership owning the outcome, are solving for the symptom.
The CDO’s real job is not managing data. It is translating between the technical reality of what the data says and the business consequence of what that means. The job also includes making the case, repeatedly and at the right level, that the cost of not solving this is higher than the cost of solving it.
If an external party with full access to your underlying data asked your most important business questions to the actual data, would the answers match what your leadership team believes to be true?
Most organizations don’t know. And the ones that are most confident they do are frequently the ones for whom the gap is largest.
The 80/20 flip is not an aspiration. It is a diagnostic. If your organization is spending the majority of its data resources proving what the data says rather than using it, the answer to that question is probably not what you think it is.
The good news is that this is a solvable problem. The difficult news is that solving it requires treating it as what it actually is: a business problem, not a data problem. You’ll also need the leadership willing to own it as such.
This was always a business problem worth solving. The arrival of AI has made it a business problem you can no longer defer. The organizations that establish enterprise data truth before they scale AI on top of it, will compound their AI investment across every initiative. The organizations that don’t, will spend the next decade debugging model outputs instead of trusting them.
The data foundation you build today is not infrastructure. It is the ceiling on everything artificial intelligence can do for your business.