System Design Interviews Roadmap

1500

System design interviews are not asking whether you memorized a shopping list of components. They are asking whether you can take an ambiguous product requirement, make the hidden constraints explicit, and choose trade-offs under time pressure.

This roadmap is the interview lens over the same foundations: how to structure an answer, how to justify each component, and how to avoid overbuilding before the problem demands it. Later, concrete use cases like rate limiter, URL shortener, news feed, chat, and metrics/logging can plug into this page as practice problems.

Interview answer shape

Use this structure for almost every design:

Clarify the product behavior — What are users doing? What is explicitly out of scope?
Estimate the load — Requests, writes, reads, storage, fan-out, latency target.
Define the API — What contract does the system expose?
Pick the data model — What is the source of truth? What must be indexed?
Choose the architecture — Stateless services, storage, cache, queue, async workers.
Handle scale and failure — Partitioning, replication, retries, idempotency, observability.
Call out trade-offs — Why this design, and what would change at 10x?

Read in order

1. The interface and request path

API Design — The first thing interviewers can evaluate: naming, versioning, idempotency, pagination, and error shape.
3 — The wire-level request behavior behind API latency, multiplexing, and connection reuse.
TLS 1.3 and mTLS — The identity and encryption layer behind HTTPS, gRPC, service-to-service calls, and zero-trust designs.
API Gateway — Auth, rate limiting, routing, and observability at the edge.
GraphQL — A useful answer for client-driven aggregation problems, not a default.
API Protocols Compared — REST, gRPC, GraphQL, WebSockets, SSE, and long-polling as workload-specific choices.
gRPC vs REST — The style trade-off once latency, payload size, streaming, or service-to-service traffic matters.

2. Storage and consistency

Database Concepts — Storage engines, indexes, WAL, and why the storage choice drives the rest of the design.
CAP and PACELC — How to explain consistency trade-offs without reciting a slogan.
Isolation Levels — What correctness guarantee your database actually provides.
MVCC — Snapshot reads, write conflicts, and why “read-heavy” is not a free category.

3. Performance and scale

Caching Techniques — The most common interview optimization, and the fastest way to introduce stale reads.
Consistent Hashing — The partitioning answer behind distributed caches and key-value stores.
Load Balancers — The request-distribution layer behind most scalable designs and many real outage stories.
DNS, GeoDNS, and Anycast — Global routing, failover, and edge selection before traffic reaches your system.
DynamoDB — A concrete cloud key-value design that packages many interview primitives together.
Redis — Cache, counter, queue, rate limiter, lock manager if you are careful — footgun if you are not.

4. Async systems and workflows

Kafka — Logs, partitions, consumer groups, backpressure, and when async turns a request path into a pipeline.
Event Sourcing & CQRS — When the history is the data model.
Change Data Capture — Moving changes through the system without dual writes.
Choreography vs Orchestration — Who owns the workflow when multiple services participate?

5. Production-quality answers

Fan-Out Strategies — The read/write trade-off behind feeds, notifications, and social products.
Backend for Frontend — When one API per client is clarity, and when it is just another service.
Multi-Tenancy — The cost model behind B2B SaaS design.
Observability — The difference between a whiteboard design and a system someone can operate.

Practice use cases

These are not published yet. When they are added, each should link back to the primitives above instead of becoming a memorized template.

Rate limiter
URL shortener
News feed / timeline
Chat / messaging
Metrics and logging
Notification system
Search autocomplete

Where to go next

Foundations — revisit the Foundations Roadmap if any building block above feels memorized rather than understood.
AI Systems — use the AI Systems Roadmap when the design problem involves RAG, model serving, retrieval, or evaluation.

Discussion

Comments are open. Anonymous is fine — pick any name and post. Comments appear after a quick moderation check.

First Principles Engineering

Explorer