The twin pillars of ML disappointment¶

In the great and creaking city of Ankh Morpork, success usually depends on three things. Luck, a certain tolerance for interesting smells, and the ability to keep a straight face while the entire system around you declares that everything is going exactly to plan. Modern machine learning is not so different. It rests upon two foundational pillars called Data and Requirements, both of which are as reliable as a guild oath made after midnight, and each of which holds up your MLOps ambitions with the same enthusiasm displayed by a wet cardboard box.

Together they ensure that your models will be flawed if you are lucky, scandalous if you are not, and profoundly misunderstood by the people who commissioned them either way.

Garbage in, gospel out¶

In theory, data is the fuel that powers your clever contraptions. In reality it behaves more like the damp kindling the Night Watch uses when they are attempting to start the annual solstice bonfire. Everyone gathers round, insists it will work, swears they have done this before, and then watches the thing smoulder resentfully while producing nothing but smoke and disappointment.

The eternal whinge¶

Data scientists have a single truly reliable setting. It is the one where they explain, with the desperation of a man who has seen the edge of the world and found it insufficiently documented, that they do not have enough data. Ten million samples? A mere whiff. More data is always needed. More data is the answer to every question. More data is the oxygen of their existence and you, apparently, are standing on the hose.

There are, to be fair, moments of genuine famine. Trying to train a model on three JPEGs and an optimistic shrug is unlikely to succeed unless your target domain is children’s fridge art. But more often than not, the real problem is not the lack of data. It is the lack of useful data.

A warehouse full of inputs can still leave you parched. Does your dataset represent humanity in all its chaotic, contradictory glory, or does it look more like a damp pile of Tesco receipts and mislabelled cat photos? Ten thousand shots of the same blurry stapler will not teach your model much beyond the unfortunate truth about your procurement department. A billion rows of missing values and timestamps set to the Unix epoch will only teach it despair.

Quantity matters, but not as much as the people chanting for more seem to believe, especially when the last time they inspected a histogram was sometime before The Patrician outlawed unlicensed fortune telling.

Data quality, where hope goes to die¶

Data quality is the noble, shimmering ideal that everyone mentions and no one recognises. Try asking for a definition. You will receive answers ranging from “not horrible” to “the thing that did not crash the pipeline last Tuesday”.

Let us review the great trio of disappointment.

Consistency¶

Consistency is the hope that your data will behave sensibly. Like a decently trained dog. What you usually get is a cat that has learned the art of irony. One column says “United Kingdom”. Another says “UK”. A third says “U.K.”. A fourth offers “Europe, sort of” as if the field itself is having an existential crisis. Attempt a join across that mess and you will feel your soul leave your body.

Correctness¶

Correct data is meant to reflect reality. Instead it often reflects an intern’s dreams during the final hour of a shift. You find “Customer Age: 217” and have to decide whether your system has uncovered an immortal or someone dropped a mug on the keyboard. Meanwhile your fraud model is busy awarding a youth savings account to a lich.

Completeness¶

Completeness is the cheerful fiction that your dataset contains information rather than an art installation composed mainly of absence. Eighty percent missing values is fine, says someone. Just impute the mean. Now everyone has exactly average income, average height, and the same number of cats. Useful if you are modelling Lego figures. Less so for people.

Cleaning data is tedious, thankless, and absolutely necessary. Much like sweeping the streets of Ankh Morpork, no one praises you for doing it, but fail to do so and everything becomes unpleasant in ways that accumulate quietly and smell alarming.

The art of lowering expectations¶

If gathering requirements feels like group therapy held in a broom cupboard, that is because it is. Stakeholders describe an AI system that is ethical, transparent, unbiased, fast, accurate, and cheap. You gently explain that they may select two and should be grateful it is that many.

Requirements documents begin life filled with hope, then slowly become graveyards of abandoned ambition. There is always a line that reads “must handle edge cases”. It will not. There is always “must be explainable”. It will not be, unless the explanation involves three wizards and a chalk circle. There is usually something about avoiding bias. See further down for details, along with a stiff drink.

Once in a while someone produces a 73 page PDF. No one reads this until six months after deployment when the system has done something outrageous and everyone is trying to work out who approved it.

Success lies not in fulfilling the vision but in managing expectations early enough that the outrage arrives in smaller, more manageable bursts.

Ethics and bias: the minefield¶

Every team insists they do not discriminate. Meanwhile the model is redlining half the city with the efficiency of a Guild accountant on deadline. Bias is not hiding in your dataset. It is the dataset. It arrives baked in, like raisins in a fruitcake no one wanted but everyone receives, year after year.

Common practice is to remove the obviously sensitive columns and congratulate yourself. Race, gender, the usual suspects. Never mind that fields such as postcode and forename are quietly carrying the same information with the determination of a mule that has not been fed in three days.

If your system subjects “Jamal from East Ham” to a lengthy security review while fast tracking “Sebastian from Surrey” into preferential customer support, something is amiss and it is not just the ambient fog.

The traditional remedy is to conduct a fairness audit, which in many organisations means skimming a Medium article titled “Ethical AI for Busy People” just before the quarterly meeting. The truth is that bias is a mirror and few wish to see their own reflection.

Explainability: the corporate fig leaf¶

Explainable AI is the respectable cloak your model wears when going to meet regulators. It looks sincere, it sounds thoughtful, and it conceals the fact that no one truly knows what that model is doing when left to its own devices.

You can talk about Feature 147 and its deviation from baseline all you like. Your users will nod politely then go looking for legal advice. SHAP values and LIME plots are clever, but they do not help the ordinary citizen who wants to know why their mortgage was denied without learning about sigmoid activation or cluster centroids.

In meetings, executives pretend to understand the diagrams, engineers pretend to enjoy producing them, and regulators pretend to find them reassuring. It is all terribly civil and almost completely pointless.

If your explanation requires a whiteboard, three hours, and a sworn oath from the High Priest of Quantitative Mysticism, it is not an explanation. It is performance art.

Regulations: the compliance theatre¶

Onto the stage shuffle the great chorus of acronyms. GDPR, HIPAA, the AI Act, and the rest. Their purpose is to ensure that if you break something important using machine learning, you at least do it in a documented fashion.

You will debate whether log files count as personal data. They do. You will argue about whether storing them in the cloud is legal. It depends. You will comfort yourself with the thought that no one audits these things until a newspaper becomes involved.

Compliance is not about being ethical. It is about being demonstrably correct in your paperwork. It resembles keeping a diary, not for posterity, but in case the Thieves Guild asks questions.

The golden rule is simple. Document everything. Not because it improves the system, but because when the regulator arrives you will want a paper trail thick enough to absorb your tears.

In AI governance, it is rarely about doing the right thing. It is about proving, in a way that looks official, that you meant to.