Consistent hashing with Scala

tags:

tech
scala

2012-02-27

I recently got into Riak, and subsequently Consistent Hashing, and found the referenced Java implementation quite a delightful and informative read.

As any hacker would, I thought I’d implement it in my current favourite language: Scala. Half-way through I realised that Scala’s TreeMap is as slow as a geriatric slug on a salted snowed-in street. So, I did the import java.util.{TreeMap => JTreeMap} thing and got cracking:

The gist can be found here: https://gist.github.com/1927001

How many virtual nodes?

Systems using consistent hashing needs the “virtual nodes” parameter tuned so as to find a sweet spot where you get good mixing of cached objects amongst nodes (i.e. all nodes store a somewhat equal amount of stuff).

The gist above, when run, will generate a file called “dat” in the current directory, which can be fed into gnuplot to show the standard deviation (as a percentage of the mean) for certain sizes of vnode.

As per the lexemetech.com article, a value not higher than 5% or 10% is preferable, and for 10 nodes, this amounts to about 100 or 200 vnodes. Let’s verify this.

gnuplot

Using a terminal, run gnuplot in the same location as “dat”, and you’ll see this prompt: gnuplot>

Now, copy/paste this entire bit of code into the terminal:

https://gist.github.com/opyate/1927001#gnuplot

Here’s the plot:

The plot

As you can see, for a large amount of vnodes, the percentage tends to 1%.

You can see this more easily by inputting this into the gnuplot prompt:

https://gist.github.com/opyate/1927001#gnuplot-amend

According to my plot, I start to get a low SD around 50 vnodes (5 times my number of nodes), but then this could just be because my hashing function mixes better than the one used by lexemetech.

« Programmer’s Bill Of Rights 2012 Semantic Web Deep Dive »