Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 10 Next »

The Framework proposed in this space (Alex Xu) is not exactly applied to propose a design : Getting started - a framework to propose...

Introduction

A key-value store (KVS) referred to a key-value database (KVD), is a non-relational database.

  • Unique identifier is stored as a key with its associated value.

  • Key must be unique and value is accessed through key.

On this page.

Part 1 - Understand the problem and establish design scope

Size of a KV pair

Small : less than 10 KB

Ability to store big data ?

Yes.

High Availability: system responds quickly even during failures ?

Yes.

High Availability: system can be scaled to support large data set ?

Yes.

Automatic scaling: adding / deleting servers based on traffic ?

Yes.

Tunable consistency ?

Yes.

Low latency ?

Yes.

Part 2 - KVS in Single Server

Even with the optimisations (data compression, frequently used data in memory and rest on disk), a distributed KV store is required for big data.

Part 3 - KVS in Distributed Environment

Real-world distributed systems

Partitions cannot be avoided. We have to choose between consistency and availability !

  • If consistency over availability because high consistent requirements = system coud unavailable.

  • if availability over consistency = system keeps accepting reads even though it returns stale data.

How to store & fetch data ? (data partition)

We can use the Consistent Hashing technic. Using this technic, gives 2 advantages :

  1. Automatic scaling: servers could be added and removed automatically depending on the load.

  2. Heterogeneity: number of virtual nodes (or replicas) for a server is proportional to the server capacity.

How to replicate data ? (data replication)

To achieve HA and reliability, data must be replicated asynchronously over N servers where N is a configurable parameter.

The replication is useful for preserving the data in a node. For replicating a key-value, we'll fetch next 2 (or 3, depending on replication factor / number of copies) distinct server nodes from the hash ring and save the copies into these nodes.

How to handle consistency ?

Since data is replicated, it must be synchronized. Issue is configuration is a typical tradeoff between latency and consistency.

CONFIGURATION OF W (WRITE OPERATION), R (READ OPERATION) AND N SERVERS

How to handle failures ?

  • No labels