Payload Logo

N.1: Consistent hashing ring

Author

Norbert Takács

Only when storing small amounts of records is a single machine sufficient. Since a single machine has limited resources (RAM, storage, CPU), our database could not store all our records on the same physical machine (node). Once we start storing millions or billions of records, we must expand the storage to multiple machines.

We want to do this with little to no overhead and evenly between the nodes. This can be achieved with a consistent hashing distributed schema since it works the same way, independent of the number of nodes.

This way, we can achieve seamless adding and removing of nodes without bottlenecking the performance of our database.

What is the ring?

Imagine a scenario where you would want to store everyone in your organization. We could use a hashing algorithm as UUIDv4 (also known as GUID) for each user in our database on four nodes.

A random user would have the following JSON structure:

1{ userId: "51376fb9-de87-4d5c-b78e-ecaeb8058f28",
2firstName: "Norbert",
3lastName: "Takacs" }

We have yet to determine which node is responsible for storing this record since we have not determined their position on the ring. Each node gets its random position in the ring based on a hashing function output.

To create our hashing ring, we must first determine the minimum and maximum values for the hashing algorithm. In the case of UUIDv4, the following are the max values and their big number representations.

The "nil" UUID, a special case, is the UUID 00000000-0000-0000-0000-000000000000; that is, all bits set to zero.

The "max" UUID, a special case, is the UUID FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF; that is, all bits set to one.

We can then determine the numeric values of the minValue and maxValue by converting them to their numeric value representation

1{ minHash: "00000000-0000-0000-0000-000000000000",
2minHashValue: 0,
3maxHash: 'FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF'
4maxHashValue: 340282366920938463463374607431768211455 }

Our example user record has the following numeric value: 107955310059342914197630047461575069480. Earning its position of 42.7% on the hashing ring.

Since the numbers are huge, we will use the range from 0 to 100 instead. With four starting nodes on this ring, each node gets a random value when the node is added. After translating the hashes, our nodes got the following values: 7, 33, 60, and 75. The nodes have their position on the ring.

When we write a record, our application hashes the userId, deducts which node is closest, and stores the record.

  • N4 will store hashes of 7-32%
  • N1 will store hashes of 33-60%
  • N2 will store hashes of 60-74%
  • N3 will store hashes of 75-6%

With a consistent hashing ring, we can store records in our database without worrying about which node should store which records. We also minimise reorganisation when nodes are added or removed.

- When a node is added, new records will be handled by the new closest node.

- When a node is removed, records are shared by the closest respective nodes

This makes scaling up and down easier.

Skewed distribution problem

One of the pitfalls of this method of node distribution is the possibility of uneven load distribution. This can happen since the node hash is randomly generated. This could cause one or more nodes to hold most of the data. In our example, if our nodes had gotten the following random values: 10, 70, 80, and 87, the first node would handle more than half of the records.

Virtual nodes

To solve the problem of skewed distribution, we can implement virtual nodes (vNodes). Each node gets assigned multiple unique hashes from different hashing functions. In our example, we will use 4 hashes from 4 different hashing algorithms for each node.

Each node will hold multiple random points on the hashing ring, allowing for better load distribution. It is important to note that the node positions will be entirely random. They will not follow a pattern on the ring.

Known examples of consistent hashing

This is the list according to Wikipedia: Consistent hashing

Couchbase automated data partitioning

OpenStack's Object Storage Service Swift

Partitioning component of Amazon's storage system Dynamo

Data partitioning in Apache Cassandra

Data partitioning in Voldemort

Akka's consistent hashing router

Riak, a distributed key-value database

Gluster, a network-attached storage file system

Akamai content delivery network

Discord chat application

Load balancing gRPC requests to a distributed cache in SpiceDB

Chord algorithm

MinIO object storage system

Join the Discussion on github