The infrastructure nobody is building

The Patrician has observed that Ankh-Morpork excels at building infrastructure that generates immediate returns while consistently failing to build infrastructure that will be desperately needed in ten years but provides no obvious benefit today. Investment in decorative fountains vastly exceeds investment in sewage treatment capacity, which is perfectly rational until the sewage situation becomes a crisis, at which point everyone expresses surprise that insufficient preparation was made despite entirely predictable growth in population and corresponding sewage production. The city has been surprised by this outcome several times. It remains surprised each time.

Technology infrastructure follows similar patterns. Enormous investment flows into data centres, AI accelerators, and networking equipment serving current demand, while investment in the unglamorous infrastructure critical in coming years remains inadequate. This is not because anyone disagrees about what will be needed. Reports identify the gaps regularly. Technical committees produce assessments. Concerned engineers submit internal memos. The reports gather dust while investment continues flowing toward whatever generates immediate returns or captures current enthusiasm.

The Patrician notes that the fact that everyone knows what is coming makes the collective failure to prepare either more forgivable, because everyone is doing it, or less forgivable, because there is no excuse for being surprised. He has not decided which. Both seem accurate.

The power problem arriving on schedule

Data centre power consumption is growing faster than electrical grid capacity in regions where data centres are concentrated. The arithmetic is straightforward. The timeline is short. The required infrastructure investments are not being made at necessary scales.

AI infrastructure requires substantially more power per rack than conventional computing. A GPU-dense rack can consume 40 to 80 kilowatts compared to 5 to 10 kilowatts for conventional servers. Utilities that planned for gradual growth are receiving requests for tens to hundreds of megawatts of new connections. Building the required substations and distribution lines requires planning approvals, construction time, and capital investment measured in hundreds of millions of euros. The timelines for utility infrastructure typically exceed the timelines for data centre construction, which creates situations where data centres exist but cannot operate at full capacity because the power has not arrived.

Data centres routinely claim to operate on renewable energy through procurement mechanisms that are legitimate accounting and not the same as physically running on renewable energy. The temporal mismatch between constant data centre demand and variable renewable generation requires storage that does not exist at necessary scales. The Patrician observes that power infrastructure has lead times measured in years while enthusiasm for projects requiring that power has lead times measured in quarters, and that this gap creates predictable crises that everyone will describe as unpredictable.

Cooling, networking, and skills

Waste heat from computing must be removed continuously or equipment fails. The cooling infrastructure required for AI workloads substantially exceeds what data centres designed for conventional computing provide. Retrofitting liquid cooling into facilities designed for air cooling is expensive and disruptive. New construction can integrate liquid cooling from the beginning. The gap between what exists and what is needed is growing as AI equipment density increases faster than cooling infrastructure is upgraded.

Networking inside data centres must increase in speed to support distributed AI training across thousands of accelerators. Wide area networking between facilities faces growing traffic volumes from large models distributed globally and user traffic to AI services. The last-mile connectivity to users remains a bottleneck for bandwidth-intensive applications. These constraints are accumulating while capacity additions proceed at rates that seemed adequate for previous workload trajectories.

The skills gap is the longest-lead-time problem and therefore receives the least attention. Training people with genuine expertise in AI systems, infrastructure operations, and the intersection of both takes years. The gap between demand and supply will persist for a decade even if training programmes improve immediately, because ten years of experience cannot be produced in less than ten years regardless of the urgency of the requirement.

The Patrician’s assessment

None of these gaps are mysteries. The gaps persist because fixing them requires investment now to prevent problems later, and humans are reliably poor at this particular type of decision-making. Organisations optimised for quarterly returns are structurally unsuited to decade-long infrastructure planning, which is not a criticism but a description of how incentives work.

The resolution will be frantic infrastructure building during crises at costs substantially higher than proactive investment would have required. This is the normal pattern. The technology industry will follow it because industries generally do.

The Patrician predicts that we will continue under-investing in necessary infrastructure until crises force investment, that the crises will be attributed to unforeseen circumstances despite being entirely foreseeable, and that the expensive reactive response will be accompanied by earnest promises to plan better next time, which will be forgotten once the immediate crisis passes and attention returns to whatever is exciting rather than what is necessary. He has made similar predictions before. His record on this type of prediction is, he regrets to say, excellent.