Transactions and Isolation Anomalies
Transactions let the application pretend certain concurrency and crash failures did not happen. Isolation level decides how convincing that illusion is.
A transaction is an abstraction that bundles reads and writes into a unit whose intermediate states should not leak and whose committed effects should survive.
ACID as engineering promises
Atomicity means all-or-nothing: if a multi-step operation fails halfway, the database cleans up. Consistency means application invariants are preserved if transactions are correct and constraints are enforced. Isolation means concurrent transactions behave as if fewer interleavings are possible. Durability means committed data survives crashes.
The dangerous letter is I. Isolation is not one thing; it is a ladder of guarantees. Weak isolation is faster and more concurrent but admits anomalies. Strong isolation is easier to reason about but costs coordination, locks, validation, or aborts.
Common anomalies
Dirty reads happen when one transaction reads uncommitted data from another. Dirty writes happen when two uncommitted transactions overwrite each other. Read skew appears when a transaction sees different points in time across reads. Lost update appears when two clients read the same value, both compute updates, and one overwrites the other.
Write skew is subtler. Two transactions each read the same set of rows, both see an invariant holds, and then write different rows such that the invariant is broken. Example: two doctors are on call; each transaction sees the other doctor on call and turns itself off. Row-level locking one row is not enough because the invariant is over a predicate.
Phantoms occur when a transaction's predicate query would match a row inserted by another transaction. They are why serializable isolation often needs predicate locks, index-range locks, or optimistic validation over the read set.
Isolation levels as trade-offs
Read committed prevents dirty reads/writes but allows non-repeatable reads. Snapshot isolation gives each transaction a consistent snapshot and avoids many read anomalies, often using MVCC. But snapshot isolation can still allow write skew because concurrent transactions may update different rows after reading the same snapshot.
Serializable isolation means the outcome is equivalent to some serial order. Databases implement it with actual serial execution, two-phase locking, or serializable snapshot isolation. The cost appears as blocking, deadlocks, aborts, and reduced concurrency.
The lesson for distributed systems is sharp: isolation is not a checkbox. It is a performance/correctness contract tied to invariants. Use strong isolation where invariants require it; use weaker isolation where the domain can tolerate anomalies or compensate.
Trade-offs
| Choice | Buys | Costs |
|---|---|---|
| Read committed | Good concurrency, prevents dirty reads | Non-repeatable reads and lost updates still possible |
| Snapshot isolation | Stable reads and MVCC-friendly performance | Write skew and predicate anomalies can remain |
| Serializable | Simplest correctness model | Locks, aborts, validation, and lower concurrency |
| Application-level checks | Can be tailored to domain | Easy to miss races unless backed by database constraints |
You should be able to name the contract this mechanism offers, the workload or invariant that justifies it, and the bill it sends somewhere else: read latency, write latency, storage, availability, freshness, or operational complexity.
Without isolation, correctness depends on lucky timing. Bugs appear only under concurrency, vanish during debugging, and corrupt state while every individual query looked reasonable.
Design prompts
- Give a write-skew example in a model registry or quota system.
- Why does snapshot isolation not always imply serializable?
- Which invariants in a feature store require transactions?