Traditional ML — MLE interview prep
A linearized read on the pre-deep-learning ML canon — trees, regression, classification, feature engineering, and the trade-offs between them. The interview territory that hasn't gone away.
Why "traditional" still matters in 2026
The instinct of a junior candidate is that deep learning subsumes everything. The instinct of a senior interviewer is the opposite: deep learning is the right tool for a narrow set of problems (perception, language, very-high-dimensional structured input). For most production ML — tabular fraud detection, churn prediction, pricing, ranking on small catalogues, demand forecasting — the right answer in 2026 is still gradient-boosted trees with thoughtful feature engineering, regularized linear models, or a hybrid.
The reason isn't nostalgia. It's that for tabular data with thousands of features and millions of rows, GBDTs and well-tuned linear models routinely beat deep models on accuracy, train in minutes instead of hours, are interpretable to non-ML stakeholders, and don't require GPUs. Most companies aren't Meta or Google — they're optimizing fraud, churn, or pricing on a tabular dataset, and the optimal model is almost always XGBoost or a calibrated logistic regression. Knowing why that's true, and where it stops being true, is what the interview tests.
What an interviewer is really testing
Four orthogonal skills. Strength in one doesn't predict the others.
| Skill | What good looks like | What weak looks like |
|---|---|---|
| Reason from first principles | Asked "why does L2 regularization help?", you derive it from the bias-variance decomposition or from a Bayesian prior, not from "it's in the loss". You distinguish ridge from lasso by their geometry of the constraint set. | You repeat that "regularization prevents overfitting" without being able to say what overfitting is or how penalizing weights addresses it. |
| Pick the right model for the problem | Given a tabular task, you can defend a choice between logistic regression, random forest, and XGBoost in 60 seconds — with reference to feature interactions, monotonicity, missing data, training time, and interpretability requirements. | You default to XGBoost for everything and can't name a case where logistic regression beats it. |
| Feature engineering instinct | You can look at a tabular schema and name 5 features the obvious model would miss: cross-features, target encoding traps, missingness as signal, time-since-event, lag features. You know when to NOT engineer features (GBDTs handle interactions; deep models discover them). | You feed raw columns to the model and rely on hyperparameter tuning to fix the gap. |
| Quantitative trade-offs | You know which model trains in minutes vs hours, which scales to billion rows vs millions, which handles 10⁴ features vs 10⁷, which gives calibrated probabilities out of the box. You answer "how big is the dataset?" before recommending an algorithm. | Qualitative pros and cons without numbers. No sense of the regimes where each algorithm dominates. |
The lessons
The first lesson establishes the bias-variance frame that every subsequent lesson uses. Then we go: linear models → trees → ensembles → kernel methods → generative → unsupervised → feature engineering → evaluation → interpretability. Each lesson can be read standalone but they reinforce each other.
How to use this
- Linearly. Lesson 1 establishes the bias-variance frame that lessons 2–6 use to explain regularization, tree depth, ensemble size, boosting iterations. Skipping it makes the rest harder.
- Touch every widget. Each lesson has at least one interactive component — a slider, a calculator, a toggle. They exist to let you feel a trade-off that prose alone makes abstract.
- Read the "interview prompts" boxes. Each lesson ends with 4–6 prompts that an interview actually asks. The answers in the prose tell you what good looks like, not just the facts.
- Don't skip the basics because they're "easy". Senior interviews probe basics with depth, not breadth. "Why does L2 work?" is harder than it looks. "When would you pick logistic regression over XGBoost?" separates good candidates from great ones.