Skip to content

Indigo team

Large-scale ops

Large-scale ops¶

Where bureaucracy meets big data, and nothing moves quickly—except the bills.

What can possibly go wrong?¶

The bureaucratic black hole¶

A simple model update requires 17 approvals, a risk assessment, and a sacrificial offering to the compliance team.

The competition launches a better product while we’re still in “governance review”.
Innovation slows to a crawl, as the safest option is to “just keep the old model running”.
A government report praises our “robust oversight” while ignoring our plummeting market share.

The legacy system labyrinth¶

The new ML system must integrate with a 20-year-old monolith that runs on COBOL and hope.

Glacial performance, as every prediction requires a pilgrimage through ancient APIs.
Engineers start having recurring nightmares about undocumented edge cases.
A museum curator calls to ask if we’ve considered donating our codebase as a “historical artifact”.

The domino disaster¶

A minor change in one system triggers a cascade of failures across the entire org.

A catastrophic outage that trends on Twitter before the team even notices.
The post-mortem document becomes a novella.
A regulator uses our incident as justification for sweeping new laws (which will, of course, make everything worse).

The cost spiral¶

The team spends £100k/month on cloud resources, half of which are running idle “just in case”.

Sudden price hikes as the company tries to recoup losses.
A grim company-wide email about “cost optimisation initiatives” (read: layoffs).
A tech blogger coins the term “ML-induced bankruptcy”.

Hallmarks of large-scale ops¶

The team has more architects than actual builders—every whiteboard is a maze of boxes and arrows.
“Multi-cloud” is both a strategy and a cry for help.
Meetings about meetings are a legitimate part of the workflow.
The phrase “we’re still evaluating the vendor options” has been uttered for 18 months straight.