The AI chip gold rush

The current scramble for AI accelerators resembles the historical gold rushes in every particular except that instead of prospectors with pickaxes, you have data centres with procurement budgets, and instead of actual gold, they’re mining for computational capability measured in FLOPs per watt. Like all gold rushes, it involves tremendous excitement, questionable economics, and the nagging suspicion that the people selling shovels are making more reliable profits than those digging for fortune.

Nvidia has become the supplier of choice for AI infrastructure, with their GPUs treated as though they were rare earth minerals essential for civilisation’s continuation. Companies announce GPU acquisitions with the solemnity previously reserved for mergers or new product launches. Cloud providers accumulate vast clusters of AI accelerators like dragons hoarding treasure, except dragons traditionally sleep on their hoards rather than running them at full capacity while frantically seeking customers to justify the capital expenditure.

The question occupying economists, investors, and anyone forced to explain their company’s hardware budget is whether this represents genuine demand for computational capability to solve real problems or speculative positioning based on the assumption that whoever controls the most AI hardware controls the future. The answer appears to be “yes,” which is distinctly unhelpful for decision-making but accurately reflects the confused state of current AI infrastructure investment.

Supply constraints and manufactured scarcity

Nvidia dominates AI accelerator markets with the comfortable position of a monopolist who doesn’t technically have a monopoly because alternatives exist, they’re just considerably less attractive for most AI workloads. Their H100 GPUs became so sought after that lead times stretched to months, availability became a competitive advantage, and companies with GPU allocations were courted like nobility with spare invitations to exclusive events.

Whether this scarcity represents genuine supply constraints or carefully managed production to maintain pricing power is a question that depends on whether you’re Nvidia’s CFO or someone trying to buy their products. Manufacturing capacity for advanced semiconductors is legitimately limited. TSMC produces chips for numerous customers and cannot instantly expand production to meet surging AI demand. This is real constraint based on physical limitations of semiconductor fabrication.

However, scarcity also benefits suppliers by maintaining high prices and preventing customers from accumulating enough inventory to pause purchasing during market downturns. If every AI company could buy unlimited GPUs whenever they wanted, prices would face downward pressure and Nvidia’s profit margins would suffer. Maintaining some degree of scarcity, even if unintentional, serves supplier interests nicely.

The situation resembles Ankh-Morpork’s occasional shortages of essential goods, which are sometimes genuine consequences of supply disruptions and sometimes the result of merchants discovering that controlled scarcity increases prices more than it decreases sales volume. Determining which is which requires information that suppliers are understandably reluctant to provide transparently.

Genuine workload growth versus speculative accumulation

Some AI chip demand is unquestionably real. Training large language models requires enormous computational resources. GPT-4 training allegedly consumed tens of thousands of GPUs running for months. Claude, Gemini, and other frontier models require similar resources. These training runs have genuine computational requirements that cannot be reduced arbitrarily without compromising model capability. The hardware demand for training frontier AI models is real and substantial.

Inference, running trained models to generate outputs, also requires significant computational resources at scale. ChatGPT serves millions of users making billions of requests. Each request requires GPU time to generate responses. The aggregate computational requirement is substantial and growing as more users adopt AI services. This is genuine demand driven by actual usage rather than speculation.

However, not all GPU accumulation serves current workloads. Companies announce plans to build massive AI infrastructure often before identifying specific applications that require such scale. The logic appears to be “AI is important, therefore we need vast AI infrastructure, therefore we must acquire GPUs urgently.” This is speculative accumulation based on assumed future needs rather than current requirements.

Cloud providers are particularly prone to this behaviour. They’re building AI infrastructure in anticipation of customer demand that may or may not materialise at predicted scales. If AI adoption grows as rapidly as expected, they’ll need every GPU they can acquire. If growth disappoints, they’ll have expensive infrastructure generating inadequate revenue to justify capital expenditure. The providers are essentially betting on AI demand trajectories, which makes them speculators with very expensive chips.

Startups acquiring GPU clusters before having paying customers represent another form of speculation. The reasoning is that AI capabilities require certain computational thresholds, so acquiring hardware early positions the company to compete. This might be strategic positioning or it might be spending investor capital on expensive hardware before validating that the business model works. Distinguishing between these interpretations requires information that startups naturally present optimistically.

Economics of AI infrastructure investment

The capital costs of AI infrastructure are staggering. A single Nvidia H100 GPU costs approximately €25,000 to €35,000 depending on configuration and availability. Training clusters require hundreds or thousands of GPUs, plus networking, cooling, power infrastructure, and operational costs. Building frontier AI capabilities requires capital expenditure measured in hundreds of millions of euros. Only well-funded organisations can participate, which concentrates AI capability among large tech companies, well-funded startups, and wealthy nations.

The economics only work if the AI capabilities generated justify these costs through revenue, cost savings, or strategic value. For established tech companies with existing revenue streams, AI infrastructure investments can be justified as necessary to remain competitive even if direct ROI is unclear. For startups, the investment requires belief that AI capabilities will generate sufficient value to recoup hardware costs plus all other expenses.

Utilisation rates matter enormously. GPUs sitting idle represent capital not generating returns. Training runs might use GPUs intensively for weeks or months, but between training runs the hardware sits waiting. Inference workloads might not fully utilise available computational capacity. Cloud providers address this through multi-tenancy, but companies operating their own infrastructure must either accept periods of underutilisation or find additional workloads to maintain efficiency.

The depreciation schedule is brutal. AI accelerators become obsolete quickly as new generations offer substantially better performance per watt and per euro. An H100 cluster expensive today will be outclassed by whatever Nvidia releases next year, which makes multi-year ROI calculations uncertain. You’re investing capital in assets that depreciate rapidly both in accounting terms and in practical capability relative to newer alternatives.

Operational costs compound capital expenses. Power consumption for large GPU clusters is measured in megawatts. Cooling infrastructure must dissipate heat from thousands of processors running continuously. Facilities must provide reliable power, networking, and physical security. Staff must maintain hardware, optimise software, and handle incidents. The total cost of ownership exceeds hardware acquisition costs substantially.

Bubble indicators and warning signs

Several characteristics of current AI chip demand resemble historical technology bubbles. Prices disconnected from fundamental economics, speculation driven by fear of missing out, and widespread belief that current trends will continue indefinitely are all present. This doesn’t prove a bubble exists, but it suggests caution is warranted.

The assumption that AI capabilities scale linearly with computational resources is questionable. Larger models generally perform better, but returns diminish and costs increase super-linearly. GPT-4 is better than GPT-3.5, but whether it’s proportionally better to justify its substantially higher training costs is debatable. The assumption that ever-larger models trained on ever-more hardware will continue providing proportional improvements may not hold indefinitely.

Infrastructure investment based on extrapolating current growth rates indefinitely resembles the classic bubble pattern of assuming trees grow to the sky. AI adoption is growing rapidly, but assuming this continues without plateau or correction leads to infrastructure buildouts that might exceed actual demand. If AI reaches saturation earlier than predicted, or if current applications prove less valuable than hoped, substantial infrastructure will remain underutilised.

The competitive dynamic where companies feel compelled to match rivals’ infrastructure investments regardless of identified needs resembles arms races more than rational capital allocation. Nobody wants to be left behind if AI transforms industries, but universal overinvestment driven by competitive fear rather than business cases is characteristic of bubble behaviour.

Secondary markets for GPU access have emerged, with companies reselling spare computational capacity and brokers arbitraging GPU availability. This financialisation of access to computational resources is both a sign of genuine scarcity and a characteristic of speculative markets where assets are traded based on anticipated future value rather than current utility.

Distinguishing signal from noise

Some AI chip demand is unquestionably legitimate. Companies with specific AI workloads requiring computational resources they don’t currently have represent genuine demand. Organisations expanding proven AI services to meet growing user bases represent genuine demand. Research institutions exploring AI capabilities represent genuine, if smaller-scale, demand.

Speculative demand is harder to identify externally but likely includes infrastructure buildouts based primarily on not wanting to miss opportunities rather than serving identified needs, GPU accumulation by companies pivoting to AI without clear business models, and infrastructure investments justified by competitive positioning rather than expected ROI.

The challenge for observers and investors is that successful technology platforms often look like bubbles during their growth phase. The internet bubble of the late 1990s was absolutely a bubble characterised by absurd valuations and unsustainable business models, but it also represented the early phase of genuinely transformative technology. Some internet investments were prescient recognition of revolutionary change, others were speculation in companies that failed spectacularly. Distinguishing between these categories in real-time was difficult then and remains difficult for AI infrastructure now.

The practical approach is accepting that current AI chip demand contains both genuine need and speculative excess in proportions that won’t be known until afterwards. Companies should base infrastructure investments on identified workloads and realistic growth projections rather than competitive fear or extrapolating current trends indefinitely. Investors should recognise that some current AI infrastructure spending represents valuable capability building and some represents capital destruction that won’t become obvious until the market corrects. Suppliers should enjoy their moment of unprecedented demand while remembering that all gold rushes eventually end, usually leaving behind substantial infrastructure that seemed essential at the time but proves excessive afterwards.

The AI chip gold rush will continue until it doesn’t. Whether participants strike gold or simply spend fortunes buying expensive shovels depends partly on AI technology delivering on its promises and partly on whether infrastructure buildouts match actual demand trajectories. The fortunate will have timed their investments well, the unfortunate will have expensive hardware generating inadequate returns, and the shovel sellers will have already banked their profits regardless of how the prospectors fare. This is how gold rushes have always worked, whether the precious resource being sought is actual gold or merely the computational substrate for artificial intelligence.