001 Notes

System Design: Three Constraints

The Frame

Most system design exists because of three constraints:

  • physics
  • money
  • single source of truth

The details change from system to system, but the pressure usually comes from one of these questions:

  • What can the machine, disk, network, or human process physically not do fast enough?
  • What would be too expensive if every request paid the full cost?
  • Where is the authoritative state, and how do copies avoid becoming dangerous lies?

This is a useful filter before adding architecture. A component should be able to explain which constraint it is handling.

Examples

ExamplePrimary constraintWhy the design existsUseful question
Horizontal scalingPhysicsOne machine cannot provide infinite CPU, memory, network, heat, power, or reliability.If vertical CPU were unlimited, would throughput-driven horizontal scaling still matter as much?
Databases, Redis, caches, and indexesMoney and physicsDurable storage is slower than memory, and putting every useful byte in RAM-like storage is expensive.If every lookup were cheap RAM-speed hash-table access, how much caching and indexing would remain?
Kafka and message queuesPhysicsRequests can arrive faster than downstream work can finish.If processing were instant, would buffering still be needed for ordinary throughput?

The table is the short version of the note: architecture appears when a constraint becomes visible.

Physics

Physics shows up whenever one machine, one disk, one network link, or one process cannot keep up.

Horizontal Scaling

Horizontal scaling exists because vertical scaling is finite.

If a single CPU could become infinitely fast, many horizontal scaling problems would disappear. Reliability and fault isolation would still matter, but the basic throughput reason for adding more machines would be weaker.

The practical constraint is:

one box cannot serve all useful work forever

So the system splits work across many boxes and pays the coordination cost.

Queues

Queues exist because producers and consumers do not always move at the same speed.

If every request could be processed instantly, many message queues would not be needed for buffering. They would still be useful for durability, replay, and loose coupling, but the core pressure is that processing takes real time.

The practical constraint is:

arrival rate can exceed processing rate

So the system accepts work, stores it safely, and lets consumers drain it at a controlled pace.

Money

Money shows up when the technically simplest solution is too expensive to run at the required scale.

Databases, Indexes, Redis, And Caches

Many storage layers exist because every read cannot pay the full cost of slow storage or expensive computation.

If every dataset fit in cheap RAM and every lookup was a direct hash-table read, a lot of indexing and caching would be less important. In reality, durable storage, memory, network calls, and recomputation all cost money.

The practical constraint is:

the common read must not pay the most expensive path

So the system keeps cheaper copies, indexes, aggregates, or cached answers close to the read path.

Single Source Of Truth

The single source of truth constraint appears when a system has many copies but only one answer is allowed to be authoritative.

Caching, replicas, search indexes, static pages, materialized views, and analytics tables all create faster or more convenient reads. They also create a correctness question:

which copy is allowed to decide truth?

If the answer is unclear, the system becomes hard to debug. Different users can see different realities, and every incident becomes an argument about which data is real.

Good design makes this explicit:

  • where truth is written
  • how derived copies are built
  • when stale reads are acceptable
  • how invalid copies are expired, rebuilt, or rejected

Design Check

Before adding a system component, ask what constraint it is paying for:

  • Load balancer: is this for traffic distribution, availability, or operational routing?
  • Cache: is this reducing read cost, hiding latency, or absorbing bursts?
  • Queue: is this buffering work, decoupling failure, enabling replay, or smoothing throughput?
  • Database replica: is this improving read capacity, availability, locality, or recovery?
  • Search index: is this serving a query shape the source store cannot answer cheaply?

If the component cannot name the constraint, it may be decoration.

The simplest useful design is usually the one where every moving part has a visible job against physics, money, or truth.