Small data ops¶

A plucky little start-up or project, where data scientists work in cosy obscurity—until everything goes sideways.

What can possibly go wrong?¶

Multiple team members, brimming with enthusiasm, independently craft identical data pipelines or train eerily similar models.

Delays in deployment as the team realises they’ve been duplicating effort for weeks.
Wasted time, inflated cloud bills, and morale slowly draining away.
Yet another start-up promising “AI-driven solutions” quietly implodes.

Team members work in blissful ignorance of each other’s activities, leading to incompatible models and duplicated datasets.

Inconsistent model behaviour depending on which silo’s output comes up.
The painful realisation that “agile collaboration” was just a buzzword after all.
Another case study in why small teams should communicate more (but probably won’t).

The team discovers that manually reprocessing data and retraining models every other week isn’t as cheap as they thought.

Code and data evolve separately, like two species on different islands—until they can no longer interbreed.

Models break because the data they were trained on no longer exists in the same form.
A frantic scramble to reconcile months of undocumented changes.
A cautionary tale about technical debt in next week’s Wired article.

Models are churned out with no auditing, versioning, or reproducibility.

The team is entirely composed of data scientists, all convinced they’ll “scale eventually”.
Python is the lingua franca, because why bother with anything else when there’s a library for everything?
Data fits neatly on a single laptop—blissfully unaware of the terabyte-shaped future looming ahead.
Development starts locally, then stumbles into the cloud when someone realises their laptop can’t handle the load.
Heavy reliance on open-source tools, with GitHub issues serving as the de facto support hotline.

Last update: 2025-08-04 08:32