Query Languages and Access-Pattern Design
Declarative languages say what you want and let the engine choose a plan; access-pattern design says the plan is part of the contract.
A query language is not just syntax. It is an agreement about who chooses the execution plan: the application, the database optimizer, or the data model itself.
Declarative queries: the engine owns the plan
SQL is declarative: you describe the result, not the loop. "Give me orders for this user in the last week" does not say whether to scan, use an index, join first, filter first, or reorder operations. The optimizer chooses a plan based on indexes, statistics, and cost estimates.
This is a huge abstraction win. The same query can get faster when you add an index or upgrade the engine. Applications can ask new questions without changing storage layout. But the abstraction has a bill: the optimizer can choose badly if statistics are stale, predicates are correlated, or the cost model is wrong. A declarative language hides the plan until the plan is the problem.
In interviews and production reviews, the senior move is to ask for the query plan when performance surprises you. The language says what. The plan says why.
Access-pattern design: make the hot path explicit
Many scalable stores invert the relationship. Instead of a general optimizer, you design tables around known access patterns: key-value lookups, wide rows, partition keys, sort keys, secondary indexes. The query is cheap because the schema has already placed the bytes where the query needs them.
This is why DynamoDB, Cassandra, and Bigtable-style schemas can feel strange to relational users. You do not start with entities; you start with questions: "get last 50 events for user," "get all model versions by registry state," "get features for entity at event time." Each hot access pattern may become its own table or index.
The benefit is predictable latency at scale. The cost is rigidity. A new question may require a new table, a backfill, or a derived view. Flexibility moves from query time to pipeline time.
MapReduce-style thinking: queries as dataflow
Some queries are too big to be interactive. For batch jobs, a query is a dataflow graph: read partitions, map records, shuffle by key, join or group, write output. The language might look declarative (SQL over Spark), but the execution is a distributed plan with physical costs: scan bytes, shuffle bytes, skew, materialization, retries.
This matters because many ML "queries" are really batch dataflows: construct training examples, join labels to features, rebuild embeddings, compute offline metrics. A declarative interface is still useful, but the designer must see the physical plan underneath. The expensive word is often shuffle, not select.
Trade-offs
| Choice | Buys | Costs |
|---|---|---|
| Declarative SQL | Flexible, optimizer can improve plans, great for ad-hoc questions | Plan can surprise you; indexes/statistics become hidden dependencies |
| Hand-coded imperative logic | Precise control, easy local reasoning | Harder to optimize globally; application owns too much data access logic |
| Access-pattern schema | Predictable low latency at scale | New access patterns require new tables or backfills |
| Distributed dataflow | Handles huge scans and recomputation | Shuffle, skew, and materialization dominate cost |
You should be able to name the contract this mechanism offers, the workload or invariant that justifies it, and the bill it sends somewhere else: read latency, write latency, storage, availability, freshness, or operational complexity.
If you do not know who owns the execution plan, nobody owns performance. The database blames the app, the app blames the database, and the real bottleneck sits inside an unseen plan.
Design prompts
- For a feature lookup API, would you prefer declarative flexibility or access-pattern rigidity?
- Name a query where adding an index makes the app faster without code changes.
- Name a query where a declarative batch job hides a huge shuffle.