Important numbers for system design
Characteristics of fundamental components
Key Metrics and Approximations
1. Latency
Latency refers to the time taken to perform an operation. Knowing approximate latencies of different components helps in evaluating performance bottlenecks.
Operation | Approximate Latency |
---|---|
L1 Cache Access | ~0.5 nanoseconds |
L2 Cache Access | ~7 nanoseconds |
RAM Access | ~100 nanoseconds |
Disk I/O (SSD) | ~100 microseconds |
Disk I/O (HDD) | ~10 milliseconds |
Network Round Trip (Same Region) | ~0.5 milliseconds |
Network Round Trip (Different Region) | ~100 milliseconds |
2. Data Transfer Rates
Understanding the speed at which data can be read or written helps in capacity planning and throughput estimation.
Medium | Transfer Rate |
---|---|
L1 Cache | ~1 TB/s |
RAM | ~25 GB/s |
SSD | ~500 MB/s |
HDD | ~200 MB/s |
1 Gbps Network | ~125 MB/s |
10 Gbps Network | ~1.25 GB/s |
3. Storage Sizes
Approximating storage needs is vital for designing databases, file systems, and other storage solutions.
Storage Metric | Approximation |
---|---|
Byte | 8 bits |
KB (Kilobyte) | bytes (1,000 bytes) |
MB (Megabyte) | bytes (1,000,000 bytes) |
GB (Gigabyte) | bytes |
TB (Terabyte) | bytes |
4. Network Metrics
Networks play a critical role in distributed systems. These numbers help estimate bandwidth usage and response times.
Metric | Approximation |
---|---|
Packet Size (TCP/UDP) | ~1.5 KB |
HTTP Request Size | ~1 KB |
HTTP Response (HTML) | ~10 KB |
HTTP Response (Image) | ~100 KB |
How to Use These Numbers in System Design
1. Back-of-the-Envelope Calculations
- Example: Estimating the read throughput of a cache.
- L2 Cache Access:
- Assume a high-speed network can send .
- Combining these latencies gives an estimate of system responsiveness.
2. Data Storage Planning
- Example: Designing a photo storage service.
- Assume 10 million users upload 5 MB photos daily.
- Daily Storage Requirement: .
3. Network Bandwidth Estimation
- Example: A chat application with 1 million users sending 50 messages per second.
- Message Size:
- Total Data Transfer: .
4. Choosing the Right Technology
- Use latency numbers to decide between:
- In-memory caching for low-latency reads.
- Disk-based storage for cheaper but slower reads.
- Use network and storage trade-offs to evaluate replication strategies.
Cheat Sheet for Quick Reference
Latency
- CPU Cycle:
- L1 Cache:
- RAM:
- SSD Read:
- HDD Seek:
- Inter-Region Network RTT:
Throughput
- RAM:
- SSD:
- HDD:
- 1 Gbps Network:
Storage
- 1 KB:
- 1 MB:
- 1 GB:
- 1 TB:
Characteristics of available high level components
Component | Key Metrics | Numbers to Know | When to Scale/Sharding Considerations | Implications for System Design |
---|---|---|---|---|
Caching | Memory, Latency, Throughput | - Memory: Up to 1TB on memory-optimized instances. - Latency: Reads < 1ms (same region), Writes 1-2ms (cross-region). - Throughput: > 100k requests/sec per instance. | - Dataset size > 1TB. - Throughput exceeds 100k ops/sec. - Latency < 0.5ms consistently. | - Caching entire datasets eliminates the need for selective caching. - Bottlenecks shift from memory size to throughput or network bandwidth. - Simplifies caching strategies with a "cache everything" approach. |
Databases | Storage, Latency, Throughput, Connections | - Storage: Up to 64 TiB (128 TiB in Aurora). - Latency: Reads 1-5ms (cached), 5-30ms (disk). Writes 5-15ms. - Throughput: Reads ~50k TPS, Writes 10-20k TPS. | - Dataset size > 50 TiB. - Write throughput consistently > 10k TPS. - Backup windows become operationally impractical. - Geographic replication needed. | - Single-node databases handle most use cases without sharding. - Premature sharding often unnecessary for systems under 50 TiB. - Sharding decisions should be driven by data volume, backup limitations, or geographic distribution. |
Application Servers | CPU, Memory, Network Bandwidth | - Connections: > 100k concurrent connections. - CPU: 8-64 cores. - Memory: 64-512GB (up to 2TB for memory-optimized instances). - Network: Up to 25 Gbps bandwidth. | - CPU utilization > 70%. - Latency exceeds SLAs. - Concurrent connections > 15k per instance. - Network bandwidth nearing 20 Gbps. | - Focus on optimizing CPU usage; memory limits are rarely reached. - In-memory caching and computations can leverage high memory availability. - Cloud platforms enable rapid scaling with containerized apps (startup time 30-60 seconds). |
Message Queues | Throughput, Latency, Storage | - Throughput: Up to 1M messages/sec per br /oker. - Latency: 1-5ms (in-region). - Message Size: 1KB-10MB. - Storage: Up to 50TB/br /oker (long retention possible). | - Throughput > 800k messages/sec. - Partition count > 200k per cluster. - Consistently growing consumer lag impacts real-time processing. - Geographic redundancy required. | - Sub-5ms latency allows synchronous use in workflows. - Retention enables event sourcing, real-time analytics, and data integration. - High throughput and storage make message queues reliable data highways for scalable systems. |
Key Takeaways
-
Caching:
- Utilize memory-optimized instances to cache large datasets, eliminating the need for complex selective caching strategies.
- Throughput and network bandwidth often become bottlenecks before memory capacity does.
-
Databases:
- Avoid premature sharding; modern databases handle massive datasets and transactions efficiently.
- Sharding should only be introduced when driven by specific needs like geographic distribution or operational constraints.
-
Application Servers:
- Modern servers are capable of handling a vast number of concurrent connections, with CPU being the usual bottleneck.
- Leverage memory for local caching or session handling to improve performance, as memory limits are rarely reached.
-
Message Queues:
- With sub-5ms latency and high throughput, queues can be integrated into synchronous workflows for reliable, scalable systems.
- Long-term storage capabilities make queues suitable for event sourcing and real-time analytics.