How to design a key-value store

The Framework proposed in this space (Alex Xu) is not exactly applied to propose a design : Getting started - a framework to propose...

Part 1 - Understand the problem and establish design scope

Size of a KV pair	Small : less than 10 KB
Ability to store big data ?	Yes.
High Availability: system responds quickly even during failures ?	Yes.
High Availability: system can be scaled to support large data set ?	Yes.
Automatic scaling: adding / deleting servers based on traffic ?	Yes.
Tunable consistency ?	Yes.
Low latency ?	Yes.

Part 3 - KVS in Distributed Environment

Real-world distributed systems

Partitions cannot be avoided. We have to choose between consistency and availability !

If consistency over availability because high consistent requirements = system coud unavailable.
if availability over consistency = system keeps accepting reads even though it returns stale data.

How to store & fetch data ? (data partition)

We can use the Consistent Hashing technic. Using this technic, gives 2 advantages :

Automatic scaling: servers could be added and removed automatically depending on the load.
Heterogeneity: number of virtual nodes (or replicas) for a server is proportional to the server capacity.

How to replicate data ? (data replication)

To achieve HA and reliability, data must be replicated asynchronously over N servers where N is a configurable parameter.

The replication is useful for preserving the data in a node. For replicating a key-value, we'll fetch next 2 (or 3, depending on replication factor / number of copies) distinct server nodes from the hash ring and save the copies into these nodes.

How to handle consistency ?

Since data is replicated, it must be synchronized. Issue is configuration is a typical tradeoff between latency and consistency.

CONFIGURATION OF W (WRITE OPERATION), R (READ OPERATION) AND N SERVERS

Introduction