Content Comparison

Tip
The Framework proposed in this space (Alex Xu) is not exactly applied to propose a design : Getting started - a framework to propose...

Introduction

A key-value store (KVS) referred to a key-value database (KVD), is a non-relational database.

Unique identifier is stored as a key with its associated value.
Key must be unique and value is accessed through key.

On this page.

Table of Contents

Part 1 - Understand the problem and establish design scope

Size of a KV pair	Small : less than 10 KB
Ability to store big data ?	Yes.
High Availability: system responds quickly even during failures ?	Yes.
High Availability: system can be scaled to support large data set ?	Yes.
Automatic scaling: adding / deleting servers based on traffic ?	Yes.
Tunable consistency ?	Yes.
Low latency ?	Yes.

Part 3 - KVS in Distributed Environment

Real-world distributed systems

Partitions cannot be avoided. We have to choose between consistency and availability !

If consistency over availability because high consistent requirements = system coud unavailable.
if availability over consistency = system keeps accepting reads even though it returns stale data.

How to store & fetch data ? (data partition)

We can use the Consistent Hashing technic. Using this technic, gives 2 advantages :

Automatic scaling: servers could be added and removed automatically depending on the load.
Heterogeneity: number of virtual nodes (or replicas) for a server is proportional to the server capacity.

How to replicate data ? (data replication)

To achieve HA and reliability, data must be replicated asynchronously over N servers where N is a configurable parameter.

The replication is useful for preserving the data in a node. For replicating a key-value, we'll fetch next 2 (or 3, depending on replication factor / number of copies) distinct server nodes from the hash ring and save the copies into these nodes.

How to handle consistency ?

Since data is replicated, it must be synchronized. Issue is configuration is a typical tradeoff between latency and consistency.

CONFIGURATION OF W (WRITE OPERATION), R (READ OPERATION) AND N SERVERS (REPLICAS OR NODES)
if W or R > 1	better consistency; but slower because coordinator must wait for response from slowest replica or node.
if W + R > N	strong consistency is guaranteed because one overlapping node has latest data to ensure consistency.
if R = 1 and W = N	optimized for fast read.
if W = 1 and R = N	optimized for fats write.
if W + R <= N	strong consistency is NOT guaranteed

See the consistency models - page References & Glossary for Key-Value store.

Versioning is the solution to resolve inconsistency. Versioning means treating each data modification as a new immutable version of data.

2 servers have the same value, but changes are performed at the same time in the both servers (conflicting values).

Example of increment when there is a conflicting value, increment with a vector block.

if D3([Sx,2],[Sy,1]) et D4([Sx,2],[Sz,1]), then

then, D5([Sx,3],[Sy,1],[Sz,1]) → Value reconciled and written by Sx.

Version	Old Version 12	New Version 13
Changes made by	thierry sinassamy	thierry sinassamy
Saved on	Jun 07, 2022	Jun 07, 2022

Versions Compared

Key