Back-of-the-envelope estimation or calculation
According to Alex Xu, latency numbers and availability numbers are related to scalability basics. To obtain correct calculations, we need to know the data volume unit using the power of 2 : A byte is a sequence of 8 bits.
Part 1 - Explaining the data volume unit
Power | Approximate value | Full name | Short name |
---|---|---|---|
10 | 1 thousand | 1 kilobyte | 1 KB |
20 | 1 million | 1 megabyte | 1 MB |
30 | 1 billion | 1 gigabyte | 1 GB |
40 | 1 trillion | 1 terabyte | 1TB |
50 | 1 quadrillion | 1 petabyte | 1 PB |
… | … | … | … |
See the first few powers of 2 in Wikipedia : Power of two - Wikipedia |
Part 2 - Latency numbers we should know
s = nanosecond, us = microsecond, ms = millisecond.
1 ns = 10^-9 seconds; 1 us = 10^-6 seconds = 1000 ns; 1 ms = 10^-3 seconds = 1000 us = 1 000 000 ns.
Operation and Time by Google
Operation name | Time (1) |
---|---|
L1 cache reference | 0.5 ns |
Branch mispredict | 5 ns |
L2 cache reference | 7 ns |
Mutex lock/unlock | 100 ns |
Compress 1K bytes with Zippy | 10 000 ns = 10 us |
Disk seek | 10 000 000 ns = 10ms |
Send packet CA (California) → Netherlands → CA | 150 000 000 ns = 150 ms |
… | … |
(1) Google reveals the length of typical computer operations in 2010.
Google built a tool to visualize the numbers above and get the following conclusions :
Memory is fast but the disk is slow.
Avoid disk seeks if possible.
Simple compression algo are fast.
Compress data before sending it over internet.
With data centers in different regions, it could take time to send data between them.
Part 3 - Availability numbers we should know
High availability is the capacity of a system to be continuously operational for desirably long period of time. A service level agreement (SLA) is a commonly used term for service providers.
SLA (Google, Amazon and Microsoft) sharing it at 99.9% or above
Availability % | Downtime per day | Downtime per week | Downtime per month | Downtime per year |
---|---|---|---|---|
99% | 14.40 min | 1.68 hours | 7.31 hours | 3.65 days |
99.9% | 1.44 min | 10.08 min | 43.83 min | 8.77 hours |
99.99% | 8.64 seconds | 1.01 min | 4.38 min | 52.60 min |
99.999% | 864.00 milliseconds | 6.05 seconds | 6.05 seconds | 5.26 min |
99.9999% | 86.40 milliseconds | 604.80 milliseconds | 604.80 milliseconds | 31.56 seconds |
Part 4 - Un example - Twitter QPS and storage requirements
Assumptions
300 Millions active user et per month
50% use it daily
2 tweets per day on average and per user
10% contain media
Data is stored for 5 years
Estimations
--) QPS (Query per second) estimate:
– DAU (Daily active users) = 300 millions * 50% = 150 millions
– Tweets QPS = 150 millions * 2 tweets / 24 hour / 3600 seconds = approx. 3500
– Peak QPS = 2 tweets * QPS = approx. 7000
--) Media storage:
– Average tweet size: tweet_id (64 bytes), text (140 bytes), media (1 MB)
– Media storage: 150 million * 2 * 10% * 1 MB = 30 TB per day.
– 5-year media storage: 30 TB * 365 * 5 = approx. 55PB.