Content Comparison

The Framework proposed in this space (Alex Xu) is not applied here : Getting started - a framework to propose...

Introduction - What is Hashing ?

A hash function is a function that maps one piece of data—typically describing some kind of object, often of arbitrary size—to another piece of data, typically an integer, known as hash code, or simply hash.

Hashing is widely used in algorithms, data structures, and cryptography.

Hash Functions
Hash functions take variable-length input data and produce a fixed-length output value.

Code Block

language	java

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

public static void main(String[] args) {
    String anotherTest = "java";
    String test = "test";
    String oneMoreTest = "dev";
    System.out.println(test.hashCode()); // output: 3556498
    System.out.println(test.hashCode()); // output: 3556498
    System.out.println(anotherTest.hashCode()); // output: 3254818
    System.out.println(oneMoreTest.hashCode()); // output: 99349
}

Hashing Algo
Hashing is a technique or process of mapping keys, values into the hash table by using a hash function : Data Structure and Algorithms - Hash Table (tutorialspoint.com)
Cryptographic Hash Functions
Cryptographic hash functions are a specialized group of hash functions. They provide an increased level of security. Thus, they are used for cryptography purposes like password verification, data integrity validation, blockchain (cryptocurrencies).

On this page.

Table of Contents

The Rehashing problem : How to balance the load to x servers ?

Why do we need Hashing first ?

But, hashing is a common way to balance the load to servers : serverIndex = hash(key) % N where N is the size of the server pool. (See an example in the page : the hashes for 4 servers - Hash % 4 servers).

But, the problem is when the servers pool is not static anymore ! New servers are added and existing servers could be removed.

If we removed 1 server, modulo changes : Hash % 3 servers. Most of keys are redistributed.

But, when 1 server goes offline, most cache clients will connect to the wrong servers to fetch data. We have an horizontal scalability problem.

The solution : Consistent hashing (CH)

How does CH work ?

CH facilitates the distribution of data across a set of nodes (DB servers) in such a way that minimizes the mapping or reorganization of data when nodes are added or removed.

1- What is CH ? Creating th Hash Key Space.

We need to consider a hash function that generates integer hash values in the following range [0, 2^32-1] and the output of a hash function is: x0, x1, x2,…,xN.

2- Representing the hash space as a hash ring

By connecting both ends, we get a hash ring. So, we can visualize it as a ring.

3- Hash servers : Placing DB servers in the hash ring

We can use the hash function and so, hash the servers on their IP address to map them to different integers.

4- Hash Keys: Determining placement of keys on servers

To find which DB server an incoming key resides on : we assume we have 4 incoming keys (key0, key1, key2, key3) and none of them directly maps the hash value of any 4 servers on our ring. We follow the hand of a clock and insert the key.

5- Adding a server in the ring

When we add a server, we need to add it between server 0 and server 3. We'll need only to remap the keys to the server 4 (new server).

So, we need to remap only k/n keys where k is the number of keys (4) and n is the number of servers (5).

6- Removing a server from the ring

A server might go down and our consistent hashing scheme ensures that it has minimal effect on the number of keys and servers affected. If server 0 goes down, only the keys between server 3 and server 0 will need to be remapped to server 1 : here, we have 2 keys to consider and the rest of the keys are unaffected.

Tip
Consistent hashing has successfully solved the horizontal scalability problem by ensuring that every time we scale up or down, we DO NOT have to re-arrange all the keys or touch all the database servers !

Solution to the issues related to CH : Introduction of replicas or virtual nodes

The server 0 could have 2 replicas and the server 1 will one replica.

Info
Now, the server 0 is responsible for 50% keys and the server 1 as well.

Benefits of HC

Enables Elastic Scaling of cluster of database/cache servers
Facilitates Replication and partitioning of data across servers
Partitioning of data enables uniform distribution which relieves hot spots

Version	Old Version 14	New Version 15
Changes made by	thierry sinassamy	thierry sinassamy
Saved on	Aug 24, 2022	Aug 24, 2022

Versions Compared

Key