Opinion & Analysis

Data as Infrastructure — Applying the Economics of Ideas to Public Data

Written by: Winston Chang | CTO, Global Public Sector at Snowflake Inc.

Updated 2:32 PM UTC, Fri June 13, 2025

The 2018 Nobel Prize in Economics was co-awarded to Paul Romer for his groundbreaking theories which redefined how we think about economic growth. A key area he espoused was the value of ideas. Romer opined that ideas were the true engine of long-term economic growth: nonrival, infinitely replicable, and uniquely powerful when recombined. An idea is usable over and over without depleting the resource. It scales at near-zero cost. And when combined with other ideas, it creates entirely new value.

Today, this endogenous growth theory applies well to data. Data behaves like an idea. But our policies often treat it like a commodity: scarce, rival, and zero-sum. Governments around the world are racing to implement a modern economic framework based on AI’s potential, but still treat data as secondary instead of the primary input driving innovation. That misjudgment is holding us back.

Romer’s theory gives us a lens and a warning. As our institutions move toward data-centric organization, data becomes the newest infrastructure that governments must get right, or risk falling behind.

Without the right policies, governance, and architectural designs, public data risks being either underutilized, privatized without compensation, or locked in silos and bureaucracy. With the right strategy and activation, it can be the backbone of a new era of sustainable growth and innovation.

First principles for a data-driven economy

Unlocking the economic potential of data will differ across federal, state, and city governments. It will differ from country to country. So we need to understand first principles, drawing from infrastructure, market design, and public/private dynamics.

1. Treat data as public infrastructure

Data is no longer a byproduct, it’s the backbone for building decision-making, model training, and product design. Roads, energy, rail, hospitals, and broadband are all key infrastructures that enabled sectors to function or even exist and citizen quality of life to increase. And like that infrastructure, data requires intentional investment, maintenance, and clear and equitable rules.

2. Governance and standards determine value

Data is only useful if it’s well-documented, discoverable, and interoperable. Public datasets that lack metadata, schema standards, or clear lineage are functionally invisible. Effective governance frameworks, style and quality controls, lineage, and access policies, are essential to make data usable and interoperable.

3. Human capital unlocks value

Data itself holds no inherent value. To borrow from Kennedy, it is of the people (or what people want to measure) and for the people. Scientists, analysts, engineers, and model builders make data actionable, enabling decisions for doctors, patients, leaders, and citizens. While this post doesn’t dive into the education and workforce needs, any data-centric strategy must invest in talent alongside infrastructure.

4. Policy is the economic multiplier

Having data or making it available does not automatically create economic value. In fact, lacking the right policies, data could easily be hoarded, degraded, or misused. Whether data enables growth or stagnation depends entirely on how we design policies that affect the ecosystems built on or adjacent to the infrastructure. That means governments in particular have a crucial role to play in building and shaping the markets built on a Sovereign Data Infrastructure.

This is not an argument for central planning nor laissez-faire privatization. It’s a call for complementarity: when governments provide well-governed data infrastructure and clear policy signals, markets can respond with products, innovation, and value creation.

This worked overall in broadband, transportation, and energy with various regulatory approaches. Government standards and investment mostly catalyzed, but sometimes limited, private-sector growth. We can do the same with data but we need new tools and rules.

A proposed model: revenue sharing for public data use

Let’s consider practically how this could work. In doing so, we can see how policy shapes a functioning data economy. Below is a simple model that combines open access and market alignment — free use of public data with revenue-sharing in for-profit outcomes.

Note, this isn’t proposed as an answer but a thought experiment based on one of the biggest pain points around government data, which is how to share the economic upside between government and industry. We’re also assuming the infrastructure, security, and privacy are implemented without issues.

How it works

Public datasets are made freely accessible to all users, including commercial entities.
If the data is used in a product or service that generates revenue, the company shares a fixed percentage (e.g., 40%) of revenue attributed to the government data’s contribution.
Attribution will be based on technical metrics, rather than subjective judgments.

Attribution methods

This model reflects how commercial firms already treat cloud hosting, third-party APIs, and software licenses: as predictable, known costs of doing business. To ensure this system is clear, fair, and broadly applicable, different attribution approaches would apply to different product types:

For data products (Synthetic or Transformed)

1. 1. The percentage of storage size of government-derived data in the final dataset is used as a proxy for contribution.
  2. For example, if a commercial dataset is 40% based on government property records and 60% from proprietary sources, the revenue share would apply to that 40%.

For ML models

1. Attribution is based on the percentage of the training dataset (by volume or token count) that originated from government data.
2. This works well because, in model training, dataset composition is a measurable and material driver of performance, while compute costs are largely tied to dataset size and model architecture.

Thought experiment

How do these rules incentivize innovation? How might they be gamed and taken advantage of? What second and third-order effects could occur once an ecosystem develops?

Commercial actors will use government data if the rules are clear, the data is good, and the costs are predictable. Firms naturally try to minimize costs, but if public data is critical to the product or service, they’ll treat the revenue share as the Cost of Goods Sold. This model aligns with their incentives because:

The dataset is often essential to product accuracy or trustworthiness — there’s no substitute for government-generated data in fields like weather, economics, energy, or health.
Revenue share becomes a known fixed cost, like infrastructure or licensing.
There is no competitive market advantage, companies still compete on their product, interface, and value to the customer.
The credibility of public data is also important to citizens and regulated sectors.

Fairness might require other elements must be built into the framework to prevent asymmetric advantages:

Materiality thresholds to avoid penalizing exploratory or research, which has multiplier value with new ideas.
Override mechanisms if storage size isn’t representative of actual impact – slowing development of unknown sectors.
Tiered rates based on revenue size to avoid burdening early-stage startups and small businesses.
Transparent reporting mechanisms, with randomized audits instead of heavy-handed enforcement.

Second and third-order effects? I leave that up to debate. This isn’t a proposed path, but one to demonstrate where the policy levers can exist and to point out the criticality of the government’s role in the next generation of data-led innovation.

Designing for growth

Paul Romer taught us that growth depends on how we use ideas, not just how many we have. The same is true for data. Without the right infrastructure and policy, data stays locked in silos or becomes a source of rent-seeking. Governments already sit on vast reservoirs of valuable data. No government I’ve spoken to questions whether to share it; the question is how. How to design the infrastructure that flourishes? How will markets respond? How will entrepreneurs use it? Now is the time for governments and their institutions (at least those that survived) to do what they’ve done well throughout history to develop a Sovereign Data Infrastructure.

How? Thoughtfully.

About the Author:

Winston Chang is CTO, Global Public Sector at Snowflake Inc. He is an expert in data-driven organizational transformation, AI/ML, and innovation in public sector ecosystems. His over two decades of work encompasses startups, IT modernizations, fashion branding, AI/ML/Blockchain prototyping, structured finance, military service, and more.

Chang volunteers his time with the NIST MEP Advisory Board and the Eisenhower Fellowship network. His engagement in both organizations supports global bridge building and strengthening US economic drivers. Winston graduated from the United States Military Academy and holds a personal mission to help government and educational institutions leverage data for maximum societal impact.