Search Ads & Recommender Systems — MLE interview prep

A linearized read on the production ML stack that powers ranking and ads. First-principles derivations, the trade-offs between every choice, and the specific things an interviewer will probe.

Why these two together

Recommender systems and search ads look like two products. Inside, they are the same machine with one extra layer:

recommender feed search-ads SERP ──────────────── ────────────── [retrieve ] ◄── billions of items ──► [retrieve ] [rank ] ◄── scoring funnel ────► [rank ] [re-rank ] ◄── business logic ───► [re-rank ] [auction ] ◄── ads only [pricing ] ◄── ads only

Everything left of the dashed line — candidate generation, embedding retrieval, CTR/CVR prediction, calibration, debiasing, A/B testing — is shared. The auction and the bidder are the two pieces ads adds on top. So an interviewer who works on either side expects you to understand both halves of the diagram, plus the ads-specific layer if you're targeting an ads team.

What an interviewer is actually testing

Four independent skills, and they're orthogonal — being strong at one tells the interviewer nothing about the others.

Skill	What good looks like	What weak looks like
Decompose a system into the funnel	Asked "design YouTube Home", you draw retrieve → rank → re-rank in 30 seconds, then go deep on the bottleneck the interviewer steers you toward. You know which metric each stage optimizes and why they can't be the same metric.	You jump into model architectures before naming the funnel. You can't say why two-tower is at the retrieval stage and DLRM is at ranking.
Reason about objectives	You translate a product goal ("more daily users") into a proxy label, then critique your own translation: position bias, survivor bias, selection bias, novelty effects. You distinguish predicting P(click \| shown) from P(click \| shown, position) from causal uplift.	You optimize for raw CTR and don't notice you've created a clickbait feedback loop.
Quantitative trade-offs	You can reason in numbers: a 10 ms p99 latency budget, a 1B-item index, a 10⁵ QPS workload, a $0.01 per-impression revenue, a 0.5% MDE on an A/B test. You know which choices buy and spend each.	You discuss qualitative pros and cons without ever multiplying. You don't know how big "big" is for the systems you're discussing.
Calibration & counterfactual thinking	You know why CTR predictions must be probabilities, not just rankings, the moment money enters the system. You understand IPS, propensity, doubly robust, and why "we trained on logged data" is not free.	You treat softmax outputs as probabilities. You evaluate a new ranker by comparing its top-K on old logs.

The lessons

Each lesson builds on the previous. The first eight cover the shared machinery; the next two cover ads-specific layers; the last one is an end-to-end design walkthrough that ties everything together.

The funnel — retrieve, rank, re-rank

Why production ranking is a multi-stage cascade and not one model. The latency / recall / precision budget at each stage. How recsys and ads systems share this structure.

Candidate generation — from CF to two-tower

Collaborative filtering, matrix factorization, two-tower neural retrievers. Why you train the towers separately and how the loss differs from a ranking loss. Cold start handled at the right layer.

Embeddings & approximate nearest neighbour

Dot product vs cosine vs L2. Maximum-Inner-Product Search, why it isn't metric. HNSW, IVF, PQ — the three knobs of an ANN index and what each costs. Recall@K vs latency vs RAM.

The ranking model — LR to DLRM to DCN-v2

Sparse high-cardinality features, embedding tables, feature crosses. Why GBDTs lost to deep learning in ranking and the specific architecture moves (DLRM, DCN, MaskNet) that won. What "feature engineering" still means.

Losses & calibration

Pointwise BCE, pairwise BPR, listwise LambdaRank. When ranking is enough and when calibrated probability matters. Platt, isotonic, temperature, expected calibration error.

Negative sampling

Why you can't train on the full softmax over a billion items. In-batch negatives, sampled softmax with the log-Q correction, hard-negative mining. The bias each one introduces and how to fix it.

Position bias, selection bias, debiasing

The training data is logs from the old ranker, not the world. Click models, the examination hypothesis, IPS, doubly robust, intervention harvesting. Why on-policy retraining stops working and what to do.

Evaluation — offline metrics & A/B tests

NDCG, MAP, AUC, log-loss, calibration error. What each measures and what each misses. A/B test design: power, MDE, variance reduction, network effects, novelty, long-term metrics.

The ad auction — GSP, VCG, quality score

Why first-price auctions broke, what GSP fixed, why VCG is truthful, and why Google still ran GSP for a decade. Quality score as expected-CTR × bid. Reserve prices, click value, externalities.

Bidding & pacing

Autobidding (tCPA, tROAS, max-conversions) viewed as a Lagrangian dual problem. Budget pacing as a control loop. Why the system's optimization horizon matters — and what breaks when the advertiser, the auction, and the pacer all optimize at once.

System design walkthrough

An end-to-end "design Instagram Reels ranking" / "design Google Sponsored Search" walk-through. Where each lesson's content shows up, with the order an interviewer expects you to reason through it.

How to use this

Linearly. Each lesson uses vocabulary from the previous one. Lesson 5 (losses) is far more compelling once you've seen the model architectures in Lesson 4. Lesson 9 (auctions) presumes you understand calibrated CTR from Lesson 5.
Touch every widget. Each lesson has at least one interactive component — a slider, a calculator, a toggle. They exist to let you feel a trade-off that prose alone makes abstract. The system-design lesson uses them to count latency budgets.
Read the "interview prompts" boxes. Each lesson ends with 4–6 prompts that the role actually gets asked. The answers in the prose tell you what good looks like, not just the facts.
Skip the auction half if you're only doing recsys. Lessons 09 and 10 are ads-specific. Lesson 11 has variants for both. The first eight are core for any ranking-team interview.

Companion material in this repo

The neighboring RL lessons cover RLHF/RLVR for LLMs — the same "design a reward function" instinct, applied to a different domain. The system_ml folder covers GPU and distributed-training internals — useful if your interview includes systems-flavored questions about training a billion-row embedding table.

A note on intellectual honesty

Recsys and ads are domains where the published research lags production by years and the production tricks are guarded. Names like DLRM and DCN-v2 are open; the actual feature crosses, calibration tweaks, and pacing constants at every major company are not. This curriculum teaches the shape of the problem with enough precision that you can reason about a specific company's choices in an interview, not by reciting their proprietary recipe.