Search Ads & Recommender Systems — MLE interview prep
A linearized read on the production ML stack that powers ranking and ads. First-principles derivations, the trade-offs between every choice, and the specific things an interviewer will probe.
Why these two together
Recommender systems and search ads look like two products. Inside, they are the same machine with one extra layer:
recommender feed search-ads SERP
──────────────── ──────────────
[retrieve ] ◄── billions of items ──► [retrieve ]
[rank ] ◄── scoring funnel ────► [rank ]
[re-rank ] ◄── business logic ───► [re-rank ]
[auction ] ◄── ads only
[pricing ] ◄── ads only
Everything left of the dashed line — candidate generation, embedding retrieval, CTR/CVR prediction, calibration, debiasing, A/B testing — is shared. The auction and the bidder are the two pieces ads adds on top. So an interviewer who works on either side expects you to understand both halves of the diagram, plus the ads-specific layer if you're targeting an ads team.
What an interviewer is actually testing
Four independent skills, and they're orthogonal — being strong at one tells the interviewer nothing about the others.
| Skill | What good looks like | What weak looks like |
| Decompose a system into the funnel |
Asked "design YouTube Home", you draw retrieve → rank → re-rank in 30 seconds, then go deep on the bottleneck the interviewer steers you toward. You know which metric each stage optimizes and why they can't be the same metric. |
You jump into model architectures before naming the funnel. You can't say why two-tower is at the retrieval stage and DLRM is at ranking. |
| Reason about objectives |
You translate a product goal ("more daily users") into a proxy label, then critique your own translation: position bias, survivor bias, selection bias, novelty effects. You distinguish predicting P(click | shown) from P(click | shown, position) from causal uplift. |
You optimize for raw CTR and don't notice you've created a clickbait feedback loop. |
| Quantitative trade-offs |
You can reason in numbers: a 10 ms p99 latency budget, a 1B-item index, a 10⁵ QPS workload, a $0.01 per-impression revenue, a 0.5% MDE on an A/B test. You know which choices buy and spend each. |
You discuss qualitative pros and cons without ever multiplying. You don't know how big "big" is for the systems you're discussing. |
| Calibration & counterfactual thinking |
You know why CTR predictions must be probabilities, not just rankings, the moment money enters the system. You understand IPS, propensity, doubly robust, and why "we trained on logged data" is not free. |
You treat softmax outputs as probabilities. You evaluate a new ranker by comparing its top-K on old logs. |
The lessons
Each lesson builds on the previous. The first eight cover the shared machinery; the next two cover ads-specific layers; the last one is an end-to-end design walkthrough that ties everything together.
01
The funnel — retrieve, rank, re-rank
Why production ranking is a multi-stage cascade and not one model. The latency / recall / precision budget at each stage. How recsys and ads systems share this structure.
02
Candidate generation — from CF to two-tower
Collaborative filtering, matrix factorization, two-tower neural retrievers. Why you train the towers separately and how the loss differs from a ranking loss. Cold start handled at the right layer.
03
Embeddings & approximate nearest neighbour
Dot product vs cosine vs L2. Maximum-Inner-Product Search, why it isn't metric. HNSW, IVF, PQ — the three knobs of an ANN index and what each costs. Recall@K vs latency vs RAM.
04
The ranking model — LR to DLRM to DCN-v2
Sparse high-cardinality features, embedding tables, feature crosses. Why GBDTs lost to deep learning in ranking and the specific architecture moves (DLRM, DCN, MaskNet) that won. What "feature engineering" still means.
05
Losses & calibration
Pointwise BCE, pairwise BPR, listwise LambdaRank. When ranking is enough and when calibrated probability matters. Platt, isotonic, temperature, expected calibration error.
06
Negative sampling
Why you can't train on the full softmax over a billion items. In-batch negatives, sampled softmax with the log-Q correction, hard-negative mining. The bias each one introduces and how to fix it.
07
Position bias, selection bias, debiasing
The training data is logs from the old ranker, not the world. Click models, the examination hypothesis, IPS, doubly robust, intervention harvesting. Why on-policy retraining stops working and what to do.
08
Evaluation — offline metrics & A/B tests
NDCG, MAP, AUC, log-loss, calibration error. What each measures and what each misses. A/B test design: power, MDE, variance reduction, network effects, novelty, long-term metrics.
09
The ad auction — GSP, VCG, quality score
Why first-price auctions broke, what GSP fixed, why VCG is truthful, and why Google still ran GSP for a decade. Quality score as expected-CTR × bid. Reserve prices, click value, externalities.
10
Bidding & pacing
Autobidding (tCPA, tROAS, max-conversions) viewed as a Lagrangian dual problem. Budget pacing as a control loop. Why the system's optimization horizon matters — and what breaks when the advertiser, the auction, and the pacer all optimize at once.
11
System design walkthrough
An end-to-end "design Instagram Reels ranking" / "design Google Sponsored Search" walk-through. Where each lesson's content shows up, with the order an interviewer expects you to reason through it.
How to use this
- Linearly. Each lesson uses vocabulary from the previous one. Lesson 5 (losses) is far more compelling once you've seen the model architectures in Lesson 4. Lesson 9 (auctions) presumes you understand calibrated CTR from Lesson 5.
- Touch every widget. Each lesson has at least one interactive component — a slider, a calculator, a toggle. They exist to let you feel a trade-off that prose alone makes abstract. The system-design lesson uses them to count latency budgets.
- Read the "interview prompts" boxes. Each lesson ends with 4–6 prompts that the role actually gets asked. The answers in the prose tell you what good looks like, not just the facts.
- Skip the auction half if you're only doing recsys. Lessons 09 and 10 are ads-specific. Lesson 11 has variants for both. The first eight are core for any ranking-team interview.
Companion material in this repo
The neighboring
RL lessons cover RLHF/RLVR for LLMs — the same "design a reward function" instinct, applied to a different domain. The
system_ml folder covers GPU and distributed-training internals — useful if your interview includes systems-flavored questions about training a billion-row embedding table.
A note on intellectual honesty
Recsys and ads are domains where the published research lags production by years and the production tricks are guarded. Names like DLRM and DCN-v2 are open; the actual feature crosses, calibration tweaks, and pacing constants at every major company are not. This curriculum teaches the
shape of the problem with enough precision that you can reason about a specific company's choices in an interview, not by reciting their proprietary recipe.