A new industry survey published this week found that nearly 90% of enterprise artificial intelligence projects stall before reaching full deployment, as companies grapple with spiralling infrastructure costs and a shortage of specialised computing hardware. The research, conducted by consulting firm McKinsey across 1,500 organisations in North America, Europe, and Asia, paints a stark picture of the gap between AI ambition and execution. Three major bottlenecks emerged: insufficient data processing capacity, unmanageable energy consumption, and a lack of engineers who can bridge research and production systems.

The Scaling Gap Widens

Despite record investment in AI capabilities, with global spending on AI systems expected to reach $154 billion this year according to IDC projections, the majority of deployments remain confined to pilot programmes. The disconnect stems partly from how AI models behave differently at scale. A system that performs accurately with 10,000 users often degrades significantly when processing requests from millions simultaneously.

Companies Reveal Why 90% of AI Projects Fail to Scale — Infrastructure Costs to Blame — Education
Education · Companies Reveal Why 90% of AI Projects Fail to Scale — Infrastructure Costs to Blame

"The jump from prototype to production is where most initiatives die," said Priya Sundaram, chief technology officer at Cloudflare, during a panel discussion in San Francisco last month. Her company processes over 45 million HTTP requests per second, yet she described the engineering effort required to maintain AI response times at that volume as "unlike anything we planned for."

Computing Power: The Primary Constraint

Graphics processing units, the specialised chips that train and run large language models, remain in critically short supply. Nvidia's H100 GPU, a workhorse for modern AI training, carries a price tag of approximately $30,000 per unit. Large-scale AI deployments typically require thousands of these processors operating in concert, pushing hardware costs into tens of millions of dollars before a single model goes live.

Waiting times for new GPU allocations stretch to nine months at major cloud providers, industry sources confirmed. This bottleneck has forced some companies to redesign their AI architectures entirely, shifting from always-on model hosting to on-demand inference that activates only when users request predictions.

The Energy Question

Running AI systems at scale consumes extraordinary amounts of electricity. Training a single large language model can require as much power as 1,000 households use in a year. Data centres operated by Microsoft, Google, and Amazon collectively consumed 71 terawatt-hours of electricity in 2023, a figure that analysts project will double by 2026 as AI workloads expand.

Some utilities in Virginia, home to a dense cluster of data centre campuses, have begun warning that power grid capacity could constrain further growth in the region by 2027 unless new transmission infrastructure is built.

What the Leaders Are Doing Differently

A small cohort of companies has managed to scale AI successfully, and their approaches share common threads. Meta released its Llama model weights publicly, allowing external researchers to contribute improvements and distribute the training burden across the developer community. This open-source strategy cut the company's internal compute costs by an estimated 40%, according to filings with the Securities and Exchange Commission.

Salesforce adopted a different tactic, embedding AI features directly into existing customer relationship management workflows rather than building standalone products. This integration-first approach reduced the scale requirements for its Einstein AI platform, since predictions happen within existing business processes rather than requiring separate infrastructure.

The Talent Shortage Compounds the Problem

Technical talent capable of deploying AI at scale remains scarce. Machine learning engineers with production experience command salaries averaging $245,000 annually in the United States, according to compensation data compiled by Levels.fyi. Even at those compensation levels, the interview process is brutal, with some companies reporting acceptance rates below 2% for senior ML positions.

Universities have struggled to produce graduates with the right blend of skills. Traditional computer science programmes emphasise algorithm design and theory, while the industry demands engineers who can manage distributed systems, optimise hardware utilisation, and debug models in production environments simultaneously.

Regulatory Pressure Adds Complexity

The European Union's AI Act, which enters full force in 2026, requires companies deploying high-risk AI systems to maintain detailed audit trails of model decisions. Complying with these provisions demands additional logging infrastructure and human oversight processes that further complicate scaling efforts. Privacy regulations in California and Brazil impose similar requirements, creating a patchwork of compliance obligations for companies operating globally.

Compliance teams at several major financial institutions told the Financial Times they have allocated dedicated engineering squads solely to documentation and audit support for AI systems, diverting resources from model development.

What Comes Next

Industry observers point to several emerging solutions that could ease the scaling crunch. Custom silicon designed specifically for AI inference, rather than repurposed gaming chips, promises better performance per watt. Google's Tensor Processing Units and Amazon's Trainium chips represent early entries in this category, though neither has yet achieved the ecosystem support that Nvidia commands.

Smaller model architectures that achieve comparable accuracy with dramatically fewer parameters represent another avenue. Mistral AI, a Paris-based startup, demonstrated last autumn that models with 7 billion parameters could match the performance of systems ten times their size on standard benchmarks. If such efficiency gains hold at scale, the infrastructure requirements for deployment shrink correspondingly.

Watch for the next major release from OpenAI expected before the end of the quarter. Sources familiar with the company's roadmap suggest the forthcoming model will prioritise inference efficiency over raw capability gains, a shift that would signal the industry's recognition that scaling has become the central challenge.

Editorial Opinion

Compliance teams at several major financial institutions told the Financial Times they have allocated dedicated engineering squads solely to documentation and audit support for AI systems, diverting resources from model development. What Comes Next Industry observers point to several emerging solutions that could ease the scaling crunch.

— newspaperarena.com Editorial Team
E
Author
Politics and Policy Correspondent with a background in international law. Specialises in electoral systems, governance reform, and the rise of populism across continents.