Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Step 1 - Understand the problem and establish design scope

Characteristics of unique IDs ?

IDs = unique & sortable

Does ID increment by 1 for each new record ?

ID increments by time, but not necessarily by 1.

IDs contain numerical values ?

Contain numerical values.

ID length requirement ?

IDs should fit into 64-bit.

Scale of the system ?

Able to generate 10000 IDs per second.

Step 2 - High-level design

Traditional approach - Non-Distributed System

...

Shared counter and timestamp function are limited because generate issues :

  • multiple independent servers can generate the same ID;

  • the same ID generates for 2 consecutives requests.

Approaches in Distributed system6

...

Uuid/Guid

MongoDB

Centralize MySQL (Flick - Ticket Server)

MySQL : Cluster

Twitter SnowflakeSnowflake

Sonyflake

Baiduuid Generator

Ordered

No

Yes, possible

Yes

Yes

Yes, roughly

Yes, roughly

Yes, roughly

Scale

Yes

Yes

No

Yes, roughly

Yes

Yes

Yes

Length (Bits)

128

96

64

64

64

63

64

Uniqueness

Rare

No, possible

No

No

No

No

No

Simplicity

Yes

Yes

Yes

No

No

No

No

Geo-latency

Fast

Fast

Slow

Fast, possible

Fast

Fast

Fast

Indexable

Hard

Yes

Yes

Yes

Yes

Yes

Yes

...

Each Web server could contain an ID Generator & the Web Server is responsible for generating IDs independently.

...

Pros & Cons for step 1

  • Pros : generating is simple (no coordination between servers and so, no synchronization issues); easy to scale in each web servers; uniqueness;

  • Cons : IDs are 128 bits long & we need 64 bits; IDs do not go up with time; IDs could be non-numeric.

CENTRALIZE MYSQL (FLICKR) - TICKET SERVER

Ticket Server will generate distributed primary keys. The idea is to use a centralized auto_increment features in a single DB server (Ticket Server).

...

Pros & Cons for step 1

  • Pros: Numeric IDs. Easy to implement and works for small to medium scale apps.

  • Cons: single point of failure, but we can scale, but will introduce more challenges.

MySQL CLUSTER - Multi-Master Replication

Uses the databases' auto_increment feature. Instead of increasing the next ID by 1, we increase it by K where K is the number of DB servers in use.

Next ID = the previous ID + 2.

...

Pros & Cons for step 1

  • Pros: simple to implement.

  • Cons: hard to scale with multiple data centers; IDs do not go up with time across multiple servers; it does not scale well when a server os added or removed.

TWITTER SNOWFLAKE

Twitter snowflake is a dedicated service for generating 64-bit unique identifiers used in distributed computing for objects within Twitter such as Tweets, Direct Messages, Lists, etc. These IDs are unique 64-bit unsigned integers, which are based on time.

...

  • Epoch timestamp in a millisecond — 41 bits (gives us 69 years for any custom epoch)

  • Configured machine/node/shard Id — 10 bits (gives us up to total of 2 i.e 1024 Ids)

  • Sequence number — 12 bits (A local counter per machine that sets to zero after every 4096 values)

  • In the beginning, the extra one reserved bit is set as 0 to make the overall number positive.

Pros & Cons for step 1

  • Pros: TIMESTAMP is sortable, fit into 64 bits, high availability is easily supported.

  • Cons: N/A

Info

Clock synchronization : all ID generation servers could have the same clock to generate the timestamp, which might not be accurate in distributed systems. In reality, system clocks can drastically skew in distributed systems.

...

  • The lifetime (174 years) is longer than that of Snowflake (69 years)

  • It can work in more distributed machines (²¹⁶) than Snowflake (²¹⁰)

  • It can generate ²⁸ IDs per 10 msec at most in a single machine/thread (slower than Snowflake)

Step 3 - Design deep dive

Alex Xu chooses the Twitter Snowflake approach because of the requirements established above.

...

41 bits make up the timestamp section : Timestamp grows with time and is sortable by time. See page References & Glossary.

...

Sequence Number

It's 12 bits which gives 2^ 12 = 4096 combinations. This field is 000000000000 unless more than 1 ID is generated in ms on the same server.

Step 4 - Pros & Cons

Pros & Cons : see the table above.

...