Step 1 - Understand the problem and establish design scope
Characteristics of unique IDs ? | IDs = unique & sortable |
---|---|
Does ID increment by 1 for each new record ? | ID increments by time, but not necessarily by 1. |
IDs contain numerical values ? | Contain numerical values. |
ID length requirement ? | IDs should fit into 64-bit. |
Scale of the system ? | Able to generate 10000 IDs per second. |
Step 2 - High-level design
Traditional approach - Non-Distributed System
...
Shared counter and timestamp function are limited because generate issues :
multiple independent servers can generate the same ID;
the same ID generates for 2 consecutives requests.
Approaches in Distributed system6
...
Uuid/Guid | MongoDB | Centralize MySQL (Flick - Ticket Server) | MySQL : Cluster | Twitter SnowflakeSnowflake | Sonyflake | Baiduuid Generator | |
---|---|---|---|---|---|---|---|
Ordered | No | Yes, possible | Yes | Yes | Yes, roughly | Yes, roughly | Yes, roughly |
Scale | Yes | Yes | No | Yes, roughly | Yes | Yes | Yes |
Length (Bits) | 128 | 96 | 64 | 64 | 64 | 63 | 64 |
Uniqueness | Rare | No, possible | No | No | No | No | No |
Simplicity | Yes | Yes | Yes | No | No | No | No |
Geo-latency | Fast | Fast | Slow | Fast, possible | Fast | Fast | Fast |
Indexable | Hard | Yes | Yes | Yes | Yes | Yes | Yes |
...
Each Web server could contain an ID Generator & the Web Server is responsible for generating IDs independently.
...
Pros & Cons for step 1
Pros : generating is simple (no coordination between servers and so, no synchronization issues); easy to scale in each web servers; uniqueness;
Cons : IDs are 128 bits long & we need 64 bits; IDs do not go up with time; IDs could be non-numeric.
CENTRALIZE MYSQL (FLICKR) - TICKET SERVER
Ticket Server will generate distributed primary keys. The idea is to use a centralized auto_increment features in a single DB server (Ticket Server).
...
Pros & Cons for step 1
Pros: Numeric IDs. Easy to implement and works for small to medium scale apps.
Cons: single point of failure, but we can scale, but will introduce more challenges.
MySQL CLUSTER - Multi-Master Replication
Uses the databases' auto_increment feature. Instead of increasing the next ID by 1, we increase it by K where K is the number of DB servers in use.
Next ID = the previous ID + 2.
...
Pros & Cons for step 1
Pros: simple to implement.
Cons: hard to scale with multiple data centers; IDs do not go up with time across multiple servers; it does not scale well when a server os added or removed.
TWITTER SNOWFLAKE
Twitter snowflake is a dedicated service for generating 64-bit unique identifiers used in distributed computing for objects within Twitter such as Tweets, Direct Messages, Lists, etc. These IDs are unique 64-bit unsigned integers, which are based on time.
...
Epoch timestamp in a millisecond — 41 bits (gives us 69 years for any custom epoch)
Configured machine/node/shard Id — 10 bits (gives us up to total of 2 i.e 1024 Ids)
Sequence number — 12 bits (A local counter per machine that sets to zero after every 4096 values)
In the beginning, the extra one reserved bit is set as 0 to make the overall number positive.
Pros & Cons for step 1
Pros: TIMESTAMP is sortable, fit into 64 bits, high availability is easily supported.
Cons: N/A
Info |
---|
Clock synchronization : all ID generation servers could have the same clock to generate the timestamp, which might not be accurate in distributed systems. In reality, system clocks can drastically skew in distributed systems. |
...
The lifetime (174 years) is longer than that of Snowflake (69 years)
It can work in more distributed machines (²¹⁶) than Snowflake (²¹⁰)
It can generate ²⁸ IDs per 10 msec at most in a single machine/thread (slower than Snowflake)
Step 3 - Design deep dive
Alex Xu chooses the Twitter Snowflake approach because of the requirements established above.
...
41 bits make up the timestamp section : Timestamp grows with time and is sortable by time. See page References & Glossary.
...
Sequence Number
It's 12 bits which gives 2^ 12 = 4096 combinations. This field is 000000000000 unless more than 1 ID is generated in ms on the same server.
Step 4 - Pros & Cons
Pros & Cons : see the table above.
...