Back-of-the-envelope estimation or calculation

According to Alex Xu, latency numbers and availability numbers are related to scalability basics. To obtain correct calculations, we need to know the data volume unit using the power of 2 : A byte is a sequence of 8 bits.

Part 1 - Explaining the data volume unit

Power

Approximate value

Full name

Short name

Power

Approximate value

Full name

Short name

10

1 thousand

1 kilobyte

1 KB

20

1 million

1 megabyte

1 MB

30

1 billion

1 gigabyte

1 GB

40

1 trillion

1 terabyte

1TB

50

1 quadrillion

1 petabyte

1 PB

See the first few powers of 2 in Wikipedia : Power of two - Wikipedia

Part 2 - Latency numbers we should know

s = nanosecond, us = microsecond, ms = millisecond.

1 ns = 10^-9 seconds; 1 us = 10^-6 seconds = 1000 ns; 1 ms = 10^-3 seconds = 1000 us = 1 000 000 ns.

Operation and Time by Google

Operation name

Time (1)

Operation name

Time (1)

L1 cache reference

0.5 ns

Branch mispredict

5 ns

L2 cache reference

7 ns

Mutex lock/unlock

100 ns

Compress 1K bytes with Zippy

10 000 ns = 10 us

Disk seek

10 000 000 ns = 10ms

Send packet CA (California) → Netherlands → CA

150 000 000 ns = 150 ms

(1) Google reveals the length of typical computer operations in 2010.

Google built a tool to visualize the numbers above and get the following conclusions :

  • Memory is fast but the disk is slow.

  • Avoid disk seeks if possible.

  • Simple compression algo are fast.

  • Compress data before sending it over internet.

  • With data centers in different regions, it could take time to send data between them.

Part 3 - Availability numbers we should know

High availability is the capacity of a system to be continuously operational for desirably long period of time. A service level agreement (SLA) is a commonly used term for service providers.

SLA (Google, Amazon and Microsoft) sharing it at 99.9% or above

Availability %

Downtime per day

Downtime per week

Downtime per month

Downtime per year

Availability %

Downtime per day

Downtime per week

Downtime per month

Downtime per year

99%

14.40 min

1.68 hours

7.31 hours

3.65 days

99.9%

1.44 min

10.08 min

43.83 min

8.77 hours

99.99%

8.64 seconds

1.01 min

4.38 min

52.60 min

99.999%

864.00 milliseconds

6.05 seconds

6.05 seconds

5.26 min

99.9999%

86.40 milliseconds

604.80 milliseconds

604.80 milliseconds

31.56 seconds

Part 4 - Un example - Twitter QPS and storage requirements

Assumptions

  • 300 Millions active user et per month

  • 50% use it daily

  • 2 tweets per day on average and per user

  • 10% contain media

  • Data is stored for 5 years

Estimations

--) QPS (Query per second) estimate:

– DAU (Daily active users) = 300 millions * 50% = 150 millions

– Tweets QPS = 150 millions * 2 tweets / 24 hour / 3600 seconds = approx. 3500

– Peak QPS = 2 tweets * QPS = approx. 7000

--) Media storage:

– Average tweet size: tweet_id (64 bytes), text (140 bytes), media (1 MB)

– Media storage: 150 million * 2 * 10% * 1 MB = 30 TB per day.

– 5-year media storage: 30 TB * 365 * 5 = approx. 55PB.