Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 19 Next »

The Framework proposed in this space (Alex Xu) is not exactly applied to propose a design : Getting started - a framework to propose...

Introduction

A key-value store (KVS) referred to a key-value database (KVD), is a non-relational database.

  • Unique identifier is stored as a key with its associated value.

  • Key must be unique and value is accessed through key.

On this page.

Part 1 - Understand the problem and establish design scope

Size of a KV pair

Small : less than 10 KB

Ability to store big data ?

Yes.

High Availability: system responds quickly even during failures ?

Yes.

High Availability: system can be scaled to support large data set ?

Yes.

Automatic scaling: adding / deleting servers based on traffic ?

Yes.

Tunable consistency ?

Yes.

Low latency ?

Yes.

Part 2 - KVS in Single Server

Even with the optimisations (data compression, frequently used data in memory and rest on disk), a distributed KV store is required for big data.

Part 3 - KVS in Distributed Environment

Real-world distributed systems

Partitions cannot be avoided. We have to choose between consistency and availability !

  • If consistency over availability because high consistent requirements = system coud unavailable.

  • if availability over consistency = system keeps accepting reads even though it returns stale data.

How to store & fetch data ? (data partition)

We can use the Consistent Hashing technic. Using this technic, gives 2 advantages :

  1. Automatic scaling: servers could be added and removed automatically depending on the load.

  2. Heterogeneity: number of virtual nodes (or replicas) for a server is proportional to the server capacity.

How to replicate data ? (data replication)

To achieve HA and reliability, data must be replicated asynchronously over N servers where N is a configurable parameter.

The replication is useful for preserving the data in a node. For replicating a key-value, we'll fetch next 2 (or 3, depending on replication factor / number of copies) distinct server nodes from the hash ring and save the copies into these nodes.

How to handle consistency ?

Since data is replicated, it must be synchronized. Issue is configuration is a typical tradeoff between latency and consistency.

CONFIGURATION OF W (WRITE OPERATION), R (READ OPERATION) AND N SERVERS (REPLICAS OR NODES)

if W or R > 1

better consistency; but slower because coordinator must wait for response from slowest replica or node.

if W + R > N

strong consistency is guaranteed because one overlapping node has latest data to ensure consistency.

if R = 1 and W = N

optimized for fast read.

if W = 1 and R = N

optimized for fats write.

if W + R <= N

strong consistency is NOT guaranteed

See the consistency models - page References & Glossary for Key-Value store.

Versioning is the solution to resolve inconsistency. Versioning means treating each data modification as a new immutable version of data.

2 servers have the same value, but changes are performed at the same time in the both servers (conflicting values).

Example of increment when there is a conflicting value, increment with a vector block.

if D3([Sx,2],[Sy,1]) et D4([Sx,2],[Sz,1]), then

then, D5([Sx,3],[Sy,1],[Sz,1]) → Value reconciled and written by Sx.

How to handle failures ?

Step 1 - Detect the failures with Gossip Protocol method

  • Each node maintains a node membership list (containing member ID & heartbeat counters).

  • Each node periodically increments its counter.

  • Each node periodically sends heartbeats to a set of random nodes which in turn propagate to another

  • Once node receives heartbeats, membership list is updated to the latest info.

  • If heartbeat has not increased for more than predefined periods, member is considered as offline.

Step 2 - Handle the failures

What if a node is temporary unavailable ? Implement the SLOPPY QUORUM method.

What if a node is permanently unavailable ? Implement the ANTI-ENTROPY PROTOCOL method to keep replicas in sync.

It involves comparing each piece of data on nodes and updating each node to newest version. A Merkle tree is used for inconsistency detection and minimize the amount of data transferred.

What tasks are performed by the nodes ?

Write path

Read path

  • No labels