all_lessons / data_intensive_systems / 03 · query languages lesson 3 / 16 · ~9 min

Query Languages and Access-Pattern Design

Declarative languages say what you want and let the engine choose a plan; access-pattern design says the plan is part of the contract.

First principle

A query language is not just syntax. It is an agreement about who chooses the execution plan: the application, the database optimizer, or the data model itself.

QUESTION | +--> declarative query: describe result, optimizer chooses path | +--> imperative scan: application describes steps | +--> access-pattern table: schema already bakes in the path The more freedom the engine has, the more optimizer complexity you buy. The more you pre-bake the path, the less flexibility you keep.

Declarative queries: the engine owns the plan

SQL is declarative: you describe the result, not the loop. "Give me orders for this user in the last week" does not say whether to scan, use an index, join first, filter first, or reorder operations. The optimizer chooses a plan based on indexes, statistics, and cost estimates.

This is a huge abstraction win. The same query can get faster when you add an index or upgrade the engine. Applications can ask new questions without changing storage layout. But the abstraction has a bill: the optimizer can choose badly if statistics are stale, predicates are correlated, or the cost model is wrong. A declarative language hides the plan until the plan is the problem.

In interviews and production reviews, the senior move is to ask for the query plan when performance surprises you. The language says what. The plan says why.

Access-pattern design: make the hot path explicit

Many scalable stores invert the relationship. Instead of a general optimizer, you design tables around known access patterns: key-value lookups, wide rows, partition keys, sort keys, secondary indexes. The query is cheap because the schema has already placed the bytes where the query needs them.

This is why DynamoDB, Cassandra, and Bigtable-style schemas can feel strange to relational users. You do not start with entities; you start with questions: "get last 50 events for user," "get all model versions by registry state," "get features for entity at event time." Each hot access pattern may become its own table or index.

The benefit is predictable latency at scale. The cost is rigidity. A new question may require a new table, a backfill, or a derived view. Flexibility moves from query time to pipeline time.

MapReduce-style thinking: queries as dataflow

Some queries are too big to be interactive. For batch jobs, a query is a dataflow graph: read partitions, map records, shuffle by key, join or group, write output. The language might look declarative (SQL over Spark), but the execution is a distributed plan with physical costs: scan bytes, shuffle bytes, skew, materialization, retries.

This matters because many ML "queries" are really batch dataflows: construct training examples, join labels to features, rebuild embeddings, compute offline metrics. A declarative interface is still useful, but the designer must see the physical plan underneath. The expensive word is often shuffle, not select.

Trade-offs

ChoiceBuysCosts
Declarative SQLFlexible, optimizer can improve plans, great for ad-hoc questionsPlan can surprise you; indexes/statistics become hidden dependencies
Hand-coded imperative logicPrecise control, easy local reasoningHarder to optimize globally; application owns too much data access logic
Access-pattern schemaPredictable low latency at scaleNew access patterns require new tables or backfills
Distributed dataflowHandles huge scans and recomputationShuffle, skew, and materialization dominate cost
What you can now decide

You should be able to name the contract this mechanism offers, the workload or invariant that justifies it, and the bill it sends somewhere else: read latency, write latency, storage, availability, freshness, or operational complexity.

What breaks if you skip this?

If you do not know who owns the execution plan, nobody owns performance. The database blames the app, the app blames the database, and the real bottleneck sits inside an unseen plan.

Design prompts

  1. For a feature lookup API, would you prefer declarative flexibility or access-pattern rigidity?
  2. Name a query where adding an index makes the app faster without code changes.
  3. Name a query where a declarative batch job hides a huge shuffle.