Small data ops

A plucky little start-up or project, where data scientists work in cosy obscurity—until everything goes sideways.

What can possibly go wrong?

The perils of reinventing the wheel

Multiple team members, brimming with enthusiasm, independently craft identical data pipelines or train eerily similar models.

  • Delays in deployment as the team realises they’ve been duplicating effort for weeks.

  • Wasted time, inflated cloud bills, and morale slowly draining away.

  • Yet another start-up promising “AI-driven solutions” quietly implodes.

The silo effect

Team members work in blissful ignorance of each other’s activities, leading to incompatible models and duplicated datasets.

  • Inconsistent model behaviour depending on which silo’s output comes up.

  • The painful realisation that “agile collaboration” was just a buzzword after all.

  • Another case study in why small teams should communicate more (but probably won’t).

Budgetary surprises

The team discovers that manually reprocessing data and retraining models every other week isn’t as cheap as they thought.

  • Features mysteriously vanish as cost-cutting measures kick in.

  • The CFO starts asking uncomfortable questions about “efficiency”.

  • Another start-up pivots to consultancy after burning through its runway.

The great divergence

Code and data evolve separately, like two species on different islands—until they can no longer interbreed.

  • Models break because the data they were trained on no longer exists in the same form.

  • A frantic scramble to reconcile months of undocumented changes.

  • A cautionary tale about technical debt in next week’s Wired article.

The black box factory

Models are churned out with no auditing, versioning, or reproducibility.

  • Unexplained model decisions that even the team can’t justify.

  • Regulatory bodies start taking an interest (never a good sign).

  • Another argument for “AI transparency” in parliamentary hearings.

Hallmarks of small data ops

  • The team is entirely composed of data scientists, all convinced they’ll “scale eventually”.

  • Python is the lingua franca, because why bother with anything else when there’s a library for everything?

  • Data fits neatly on a single laptop—blissfully unaware of the terabyte-shaped future looming ahead.

  • Development starts locally, then stumbles into the cloud when someone realises their laptop can’t handle the load.

  • Heavy reliance on open-source tools, with GitHub issues serving as the de facto support hotline.


Last update: 2025-05-19 20:21