Algorithm selection cheat sheetΒΆ
| Scenario | Recommended approach | Privacy/security threat | | Labeled data + prediction | Supervised learning (RF, SVM) | Low (if data is anonymised) | | Unlabeled data + patterns | Unsupervised (K-Means, PCA) | Low (unless patterns leak sensitive info) | | Few labels + lots of unlabeled | Semi-Supervised | Moderate (unlabeled data may contain PII*) | | Sequential decisions | Reinforcement learning | High (reward hacking, model poisoning) | | Multiple related tasks | Multitask learning | Low (unless shared features leak data) | | Privacy-sensitive distributed data | Federated learning | High (model inversion attacks, backdoors) | | Small dataset + pre-trained model | Transfer learning | Moderate (pre-trained models may carry biases) | | Generalising from examples | Inductive learning (ML models) | Low (if training data is secure) | | Applying fixed rules | Deductive learning (Expert systems) | Low (deterministic, no data leakage) | | Similarity-based prediction | Transductive learning (k-NN) | High (training data memorisation risks) | | Human feedback for agent training | Human-centered RL | Moderate (human feedback may leak sensitive info) | | Label-efficient training | https://indigo.tymyrddin.dev/docs/landscape/hitl#active-learning | Moderate (queries may reveal sensitive data) |