An introduction to the colourful cast of characters¶

Machine learning is rather like a mismatched football squad in Ankh Morpork. You have the by-the-book defenders, which everyone calls supervised learning, doing what they are told with the rigid enthusiasm of a city watchman counting barrels. Then you have the chaotic strikers, unsupervised learning, darting about with no clear plan, occasionally kicking the ball into the river. And there is always that one player who manages to score own goals with remarkable consistency, otherwise known as reinforcement learning on a bad day.

These digital brainboxes power everything from your sat-nav’s questionable directions to the NHS’s attempts at guessing who will next clog up A&E after indulging at the local curry house. Some operate with the discretion of an MI5 agent, such as federated learning, while others gossip like a Wetherspoons regular after one too many pints, like certain transductive models.

Within this motley crew you will find the know-it-all swots, deductive learners, the lazy students scraping by, semi-supervised learners, and the over-efficient types who borrow other people’s homework, transfer learners. Between them, they can identify your cat photos, recommend dreadful television, and occasionally, when motivated, actually detect something important like cancer cells.

Learning models¶

Supervised learning¶

Supervised learning is like teaching a child with flashcards, except the child is an algorithm and the flashcards are thousands of labelled data points. You show it pictures of cats and dogs, explain which is which, and hope it does not conclude that every four-legged creature is probably a moggy. Its task is to find patterns that map inputs to outputs. Brilliant at memorising, terrible when confronted with novelty, like a Sphynx cat that looks vaguely extraterrestrial.

In practice, your email spam filter is a classic example. Trained on millions of emails marked “spam” or “not spam,” it learns that words like “free” and “Nigerian prince” are red flags. Occasionally, it gets overzealous and files your boss’s urgent email under junk because it contained the phrase “meet me tomorrow.” Meanwhile the actual spam about dubious pills slips through because the algorithm has decided it is probably a medical newsletter.

There are privacy and security concerns too. If the training data contains sensitive information, a clever attacker could reverse-engineer the model. A health insurance predictor might accidentally reveal that people aged 30–40 who buy gluten-free pasta are high-risk. The model can also be poisoned with fake labels, turning a loyal spam filter into a willing accomplice for scam emails.

Unsupervised learning¶

Unsupervised learning is the algorithm equivalent of dumping a pile of puzzle pieces in the middle of the Guildhall and saying, “Sort that out.” No guidance, no labels, just raw data and existential dread. The algorithm clusters similar items, simplifies data, or flags odd outliers. Think of it as a detective trying to solve a crime without knowing there has been a crime, only that something statistically unusual has occurred.

Netflix uses this to group users into taste clusters, which is why you end up watching true crime at three while your recommendations gently suggest melatonin. Retailers use it for fraud detection, like your card being blocked for a single banana in Spain. Privacy risks abound, especially when rare behaviour patterns reveal sensitive information, such as purchases indicating pregnancy or medical conditions.

Semi-supervised learning¶

Semi-supervised learning is the lazy student of the class. It learns from a few labelled examples and a mountain of unlabelled data. The labelled examples act as stabilisers, while the unlabelled data careens downhill hoping for the best. Google Photos does this when you label a few pictures of Karen and the algorithm tags your golden retriever as Karen. Speech recognition systems employ it too, turning “Call Mum” into “Ball Bomb” from time to time.

Risks include the unlabelled data containing private information and incorrect labels propagating errors. Imagine a semi-supervised hate speech detector that learns from bad examples and subsequently flags all political discourse as toxic.

Reinforcement learning¶

Reinforcement learning is like training a dog with treats, except the dog is a robot and the treats are mathematical rewards. The agent tries actions, receives feedback, and adjusts. Tesla’s Autopilot learns from millions of miles driven, occasionally receiving corrections from human drivers. AlphaGo learned by playing itself millions of times, resembling a sleepless chess prodigy.

RL systems can be manipulated through reward shaping, potentially leading to unsafe behaviour, and can leak data about the environment they inhabit, like your house layout being unintentionally revealed online.

Ensemble learning¶

Ensemble learning is like asking a panel of slightly drunk pundits for their opinion, averaging it, and calling it wisdom. Each model is mediocre alone, but together they are slightly less wrong. Netflix recommendations and NHS risk predictions often rely on this method. Compromise one model with poisoned data and the entire system can go pear-shaped.

Transfer learning¶

Transfer learning is standing on the shoulders of giants, where the giants are algorithms that did the hard work first. A model trained to recognise cats can be repurposed to spot tumours. Google Lens and medical imaging systems use this. Risks include hidden biases from the pre-trained model and potential for model stealing, allowing someone to reverse-engineer your work.

Federated learning¶

Federated learning is the Ankh-Morpork neighbourhood watch of algorithms. Your device learns from your behaviour, whispers the lessons to a central server, and everyone pretends this is not creepy. Data allegedly never leaves your device, but clever attackers can reconstruct sensitive information from model updates. Side channels, metadata, and model poisoning create further risks. Apple’s predictive keyboard and NHS experiments with federated learning demonstrate the balance of decentralised learning and lurking exposure.

Statistical models¶

Inductive learning¶

Inductive learning generalises from examples. A traffic warden tickets one car and concludes all vehicles must be banned. Credit scoring systems employ this to assess trustworthiness based on habits, with sometimes hilarious or unfair outcomes. Adversarial manipulation of data can fool the system, and sensitive correlations may be revealed.

Deductive learning¶

Deductive learning is strict logical reasoning. Tax calculation software uses it to follow rules, often failing when reality refuses to comply. Privacy risks are low, but biases may be embedded in the rules themselves.

Transductive learning¶

Transductive learning memorises training data and improvises when new data appears. It is like cheating by writing answers on your hand. NHS COVID contact tracing apps once used this to memorise infection patterns, with predictable panics when reality deviated. Privacy risk is high, as the model can leak memorised training data.

Human-in-the-loop models¶

Human-centred reinforcement learning¶

Humans add feedback to reinforcement learning, guiding the system with sighs and corrections. ChatGPT uses this, with thousands of underpaid reviewers teaching it to avoid absurd answers, except for the occasional conspiracy theorist query. Feedback channels can be abused or compromise privacy.

Active learning¶

Active learning is the needy intern of algorithms, constantly asking “is this right?” It reduces human workload by querying the most uncertain data points, used in medical imaging, but the queries themselves can reveal sensitive information. Blind spots develop in areas that are never questioned.