0% found this document useful (0 votes)

14 views19 pages

Consistent Hashing

Uploaded by

varunprint1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views19 pages

Consistent Hashing

Uploaded by

varunprint1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Consistent Hashing

I've bumped into consistent hashing a couple of times lately. The

paper that introduced the idea (Consistent Hashing and Random
Trees: Distributed Caching Protocols for Relieving Hot Spots on the
World Wide Web by David Karger et al) appeared ten years ago,
although recently it seems the idea has quietly been finding its way
into more and more services, from
Amazon's Dynamo to memcached (courtesy of Last.fm). So what is
consistent hashing and why should you care?

The need for consistent hashing arose from limitations experienced

while running collections of caching machines - web caches, for
example. If you have a collection of n cache machines then a common
way of load balancing across them is to put object o in cache machine
number hash(o) mod n. This works well until you add or remove cache
machines (for whatever reason), for then n changes and every object
is hashed to a new location. This can be catastrophic since the
originating content servers are swamped with requests from the
cache machines. It's as if the cache suddenly disappeared. Which it
has, in a sense. (This is why you should care - consistent hashing is
needed to avoid swamping your servers!)

It would be nice if, when a cache machine was added, it took its fair
share of objects from all the other cache machines. Equally, when a
cache machine was removed, it would be nice if its objects were
shared between the remaining machines. This is exactly what
consistent hashing does - consistently maps objects to the same cache
machine, as far as is possible, at least.

The basic idea behind the consistent hashing algorithm is to hash both
objects and caches using the same hash function. The reason to do
this is to map the cache to an interval, which will contain a number of
object hashes. If the cache is removed then its interval is taken over
by a cache with an adjacent interval. All the other caches remain
unchanged.
Demonstration
Let's look at this in more detail. The hash function actually maps
objects and caches to a number range. This should be familiar to
every Java programmer - the hashCode method on Object returns an int,
which lies in the range -231 to 231-1. Imagine mapping this range into a
circle so the values wrap around. Here's a picture of the circle with a
number of objects (1, 2, 3, 4) and caches (A, B, C) marked at the
points that they hash to (based on a diagram from Web Caching with
Consistent Hashing by David Karger et al):

To find which cache an object goes in, we move clockwise round the
circle until we find a cache point. So in the diagram above, we see
object 1 and 4 belong in cache A, object 2 belongs in cache B and
object 3 belongs in cache C. Consider what happens if cache C is
removed: object 3 now belongs in cache A, and all the other object
mappings are unchanged. If then another cache D is added in the
position marked it will take objects 3 and 4, leaving only object 1
belonging to A.

This works well, except the size of the intervals assigned to each
cache is pretty hit and miss. Since it is essentially random it is
possible to have a very non-uniform distribution of objects between
caches. The solution to this problem is to introduce the idea of "virtual
nodes", which are replicas of cache points in the circle. So whenever
we add a cache we create a number of points in the circle for it.

You can see the effect of this in the following plot which I produced by
simulating storing 10,000 objects in 10 caches using the code
described below. On the x-axis is the number of replicas of cache
points (with a logarithmic scale). When it is small, we see that the
distribution of objects across caches is unbalanced, since the standard
deviation as a percentage of the mean number of objects per cache
(on the y-axis, also logarithmic) is high. As the number of replicas
increases the distribution of objects becomes more balanced. This
experiment shows that a figure of one or two hundred replicas
achieves an acceptable balance (a standard deviation that is roughly
between 5% and 10% of the mean).

Implementation
For completeness here is a simple implementation in Java. In order for
consistent hashing to be effective it is important to have a hash
function that mixes well. Most implementations
of Object's hashCode do not mix well - for example, they typically produce
a restricted number of small integer values - so we have
a HashFunction interface to allow a custom hash function to be used.
MD5 hashes are recommended here.

import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHash<T> {

private final HashFunction hashFunction;

private final int numberOfReplicas;
private final SortedMap<Integer, T> circle =
new TreeMap<Integer, T>();

public ConsistentHash(HashFunction hashFunction,

int numberOfReplicas, Collection<T> nodes) {

this.hashFunction = hashFunction;
this.numberOfReplicas = numberOfReplicas;

for (T node : nodes) {

add(node);
}
}

public void add(T node) {

for (int i = 0; i < numberOfReplicas; i++) {
circle.put(hashFunction.hash(node.toString() + i),
node);
}
}

public void remove(T node) {

for (int i = 0; i < numberOfReplicas; i++) {
circle.remove(hashFunction.hash(node.toString() + i));
}
}

public T get(Object key) {

if (circle.isEmpty()) {
return null;
}
int hash = hashFunction.hash(key);
if (!circle.containsKey(hash)) {
SortedMap<Integer, T> tailMap =
circle.tailMap(hash);
hash = tailMap.isEmpty() ?
circle.firstKey() : tailMap.firstKey();
}
return circle.get(hash);
}

}
The circle is represented as a sorted map of integers, which represent
the hash values, to caches (of type T here).
When a ConsistentHash object is created each node is added to the circle
map a number of times (controlled by numberOfReplicas). The location of
each replica is chosen by hashing the node's name along with a
numerical suffix, and the node is stored at each of these points in the
map.

To find a node for an object (the get method), the hash value of the
object is used to look in the map. Most of the time there will not be a
node stored at this hash value (since the hash value space is typically
much larger than the number of nodes, even with replicas), so the
next node is found by looking for the first key in the tail map. If the
tail map is empty then we wrap around the circle by getting the first
key in the circle.
Usage
So how can you use consistent hashing? You are most likely to meet it
in a library, rather than having to code it yourself. For example, as
mentioned above, memcached, a distributed memory object caching
system, now has clients that support consistent hashing.
Last.fm's ketama by Richard Jones was the first, and there is now
a Java implementation by Dustin Sallings (which inspired my
simplified demonstration implementation above). It is interesting to
note that it is only the client that needs to implement the consistent
hashing algorithm - the memcached server is unchanged. Other
systems that employ consistent hashing include Chord, which is a
distributed hash table implementation, and Amazon's Dynamo, which
is a key-value store (not available outside Amazon).
Posted by Tom White at 17:26
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: Distributed Systems, Hashing

14 comments:

morrita said...
good article!
i've made Japanese translation for your article. which is available
at
http://www.hyuki.com/yukiwiki/wiki.cgi?ConsistentHashing .
if you have any trouble, please let me know.
thank you for your work.
1 December 2007 at 06:06

Tom White said...

morrita - Glad you enjoyed the post and thanks for the translation!
Tom
2 December 2007 at 22:26

Marcus said...
Cool! I'm as we speak creating a distributed caching and searching
system which uses JGroups for membership. The biggest problem I
faced was this exact thing. What to do on the
member-joined/leaved events and for the system to be able to know
at all times to which node to send what command :)

The caching system is strictly following the Map (and SortedMap)

interface and a bunch of implementations have been implemented.
LFU, LRU, MRU, Diskbased B+Tree (jdbm), ehcache wrapper,
memcached java client wrapper, hibernate support...
I like the Map interface since it is quite clean..

The impl I'm working on now is a cache/persister which uses HDFS

as persistance layer. See how that turns out. The line between a
cache and a persistence engine is fine.

And of course all caches must be searchable = My own

indexer/searcher + Lucene free text index/search, ohh and all must
be able to work in a distributed environment.. fuck it is a big task.
25 December 2007 at 09:37

marcusherou said...
Hi. Do you have any clue of how to create an algorithm which
tracks the history of joins/leves of members and delivers the same
node for the same key if it previously looked it up. Perhaps I'm
explaining this in bad terms but something like a (in memory or
persistent) database in cojunction with a consistent hash.

perhaps:
public Address getAddress(key)
{
if(lookedUpMap.containsKey(key))
{
return (Address)lookedUpMap.get(key)
}
else
{
Address a = get(key);
lookedUpMap.put(key, a);
return a;
}
}

Then it is up to the client to check if the previously stored node is

reachable at this very moment.

An extension to this cache would be to return a Set which size is

equal to the nr of replicas of each key/value.

If a value is stored on two or more nodes then the possibilty for

that at least one of the servers is up increases a lot. I'm building a
distributed storage solution and the data cannnot be lost just
because the cache "recalculates" the hash :)

Ahhh enough already I will implement something like this :)

3 January 2008 at 20:37

naorei said...
This site has talks lots about the cache with its advantages, thanks
for the topic, it will lots of gain for to the visitors for this site.

Naorei
==============================
http://community.widecircles.com
29 July 2008 at 04:53

John Cormie said...

Nice explanation of consistent hashing, but there is a subtle bug in
the implementation. The problem is hash collisions. Suppose two
different nodes x and y have replicas that hash to the same Integer
key in "circle". The sequence of calls "add(x); add(y);" will clobber
one of x's replicas, and a later call to remove(y) will clear y's
replica from "circle" without restoring x's.

One consequence of this bug is that as nodes come and go you may
slowly lose replicas. A more serious consequence is that if two
clients notice changes to the set of nodes in a different order (very
possible in a distributed system), clients will no longer agree on
the key to node mapping. For example, suppose nodes x and y have
replicas that collide as above, and node x fails around the same
time that new (replacement?) node y comes online. In response,
suppose that ConsistentHash client A invokes add(y) ... remove(x)
and that another client B does the same but in the reverse order.
From now on, the set of keys corresponding to the clobbered
replica will map to node y at B, while A will map them somewhere
else.

How likely are collisions? If we model hashing as a random

function, expect a collision after only about 2^(32/2) = 65536
insertions with Integer sized keys
(http://en.wikipedia.org/wiki/Birthday_paradox#Cast_as_a_collision
_problem). If numberOfReplicas is set to the suggested 200
replicas per node, we expect a collision in a system with only 328
nodes.
31 January 2009 at 21:51

Christophe said...
I have written a small test app that tells me how many nodes
should be relocated in the event of a node addition, by comparing
the same dataset over 2 hashsets.

It consistently reports that nearly all nodes should be moved ... am

I doing something wrong ?

http://pastebin.com/f459047ef
31 May 2009 at 01:40

chingju said...
great article, and the japanese translation from morrita is cool!!!
19 September 2009 at 03:19
V

Implement LRU Cache

Implement LRU Cache
Least Recently Used (LRU) caching scheme is to discard the least recently used items first when the
cache is full and a newly visted item needs to be added to the cache.
Design a LRU caching scheme that implement the following interface.

public interface LruPageCache {

/** Set the capacity of the cache. */
public setCapacity(int capacity);

/** Returns the page number and update cache accordingly.

* This time complexity of this method should be O(1).
*/
public int loadPage(int pageNum);
}

Solution

We need a data structure to check whether a page number is in cache in constant

time. HashMap with each page number as a key can make it.

We also need a data structure to maintain page numbers in cache in the order of their access time.
One way to do that is to keep a timestamp field for each record, but we still need to sort them which
cannot be done in O(1) time. Alternatively, we can use a linked list to keep all records, and move the
newly visited one to the head of the list. To get O(1) time complexity for updating such a linked list,
we need a doubly linked list.

Each time when a new page number comes in,

 If it is already in the cache, move the node to the head of the linked list;
 If it is not in the cache, insert it to the head of the linked list and update the current capacity of the
cache. If the cache is full, remove the last node of the linked list. (So, we also need a tail pointer. :)
public static class LruCacheImpl implements LruPageCache {
private int capacity = 0;
private int maxCapacity = 10;
private DListNode head = null;
private DListNode tail = null;
private HashMap<Integer, DListNode> map = new HashMap<Integer,
DListNode>();

/** {@inheritDoc} */
@Override
public void setMaxCapacity(final int limit) {
if (limit < 1) {
throw InvalidInputException("Max capacity must be positive.");
}
maxCapacity = limit;
}

/** {@inheritDoc} */
@Override
public int loadPage(final int page) {
final DListNode cur;

if (map.containsKey(page)) { // cache hit

cur = map.get(page);
if (cur != head) {
remove(cur);
insertToHead(cur);
}
print();
return cur.val;
}

// cache miss
cur = new DListNode(page);
insertToHead(cur);
map.put(page, cur);

if (capacity == maxCapacity) {
removeTail();
} else {
++capacity;
}
print();

return cur.val;
}

/** Remove the given node from the linked list. */

private void remove(final DListNode cur) {
if (cur.pre != null) cur.pre.next = cur.next;
if (cur.next != null) cur.next.pre = cur.pre;
if (tail == cur) tail = cur.pre;
}
/** Remove the tail of the linked list and return the deleted node. */
private DListNode removeTail() {
map.remove(tail.val);
DListNode last = tail;
tail = tail.pre;
tail.next = null;
if (head == last) head = null;
return last;
}

/** Add the given node to the head of the linked list. */
private void insertToHead(final DListNode cur) {
cur.next = head;
cur.pre = null;
if (head != null) head.pre = cur;
head = cur;
if (tail == null) tail = cur;
}

private void print() {

DListNode cur = head;
System.out.print("head->");
while (cur != null) {
System.out.print(cur.val);
if (cur == tail) System.out.print(" (tail)");
else System.out.print("->");
cur = cur.next;
}
System.out.println("");
}

/** Doubly Linked list */

private class DListNode {
DListNode pre = null;
DListNode next = null;
int val;
DListNode(int v) {
val = v;
}
}
}
This scheme provide O(1) time for loadPage and uses O(n) spaces where n is the maxCapacity of
the cache.

Test outputs:

public static void main(String[] args) {

LruCacheImpl cache = new LruCacheImpl();
cache.setMaxCapacity(4);
System.out.println(cache.loadPage(2));
System.out.println(cache.loadPage(3));
System.out.println(cache.loadPage(1));
System.out.println(cache.loadPage(2));
System.out.println(cache.loadPage(4));
System.out.println(cache.loadPage(1));
System.out.println(cache.loadPage(4));
System.out.println(cache.loadPage(5));
System.out.println(cache.loadPage(6));
}
-----
output:
head->2 (tail)
2
head->3->2 (tail)
3
head->1->3->2 (tail)
1
head->2->1->3 (tail)
2
head->4->2->1->3 (tail)
4
head->1->4->2->3 (tail)
1
head->4->1->2->3 (tail)
4
head->5->4->1->2 (tail)
5
head->6->5->4->1 (tail)
6

Note: Java provides a LinkedHashMap class which implements hashmap on a doubly linked list. It is
essentially what we have done here except that LinkedHashMap has no capacity limit. So, in real world, if
you need an infinite LRU cache, don't reinvent wheel!

Posted 1st September 2013 by Sophie

Labels: Design Hash Java LinkedList

Programmer’s Toolbox Part 3: Consistent

Hashing
Published on March 17, 2008 in Toolbox. 45 CommentsTags: consistent hashing, dynamo, memcached, partitioning.

Next up in the toolbox series is an idea so good it deserves an entire article all to itself: consistent

hashing.

Let’s say you’re a hot startup and your database is starting to slow down. You decide to cache some

results so that you can render web pages more quickly. If you want your cache to use multiple servers

(scale horizontally, in the biz), you’ll need some way of picking the right server for a particular key. If

you only have 5 to 10 minutes allocated for this problem on your development schedule, you’ll end up

using what is known as the naïve solution: put your N server IPs in an array and pick one using key %

I kid, I kid — I know you don’t have a development schedule. That’s OK. You’re a startup.

Anyway, this ultra simple solution has some nice characteristics and may be the right thing to do. But

your first major problem with it is that as soon as you add a server and change N, most of your cache

will become invalid. Your databases will wail and gnash their teeth as practically everything has to be

pulled out of the DB and stuck back into the cache. If you’ve got a popular site, what this really means

is that someone is going to have to wait until 3am to add servers because that is the only time you can

handle having a busted cache. Poor Asia and Europe — always getting screwed by late night server

administration.
You’ll have a second problem if your cache is read-through or you have some sort of processing

occurring alongside your cached data. What happens if one of your cache servers fails? Do you just fail

the requests that should have used that server? Do you dynamically change N? In either case, I

recommend you save the angriest tweets about your site being down. One day you’ll look back and

laugh. One day.

As I said, though, that might be OK. You may be trying to crank this whole project out over the

weekend and simply not have time for a better solution. That is how I wrote the caching layer for

Audiogalaxy searches, and that turned out OK. The caching part, at least. But if had known about it at

the time, I would have started with a simple version of consistent hashing. It isn’t that much more

complicated to implement and it gives you a lot of flexibility down the road.

The technical aspects of consistent hashing have been well explained in other places, and you’re crazy

and negligent if you use this as your only reference. But, I’ll try to do my best. Consistent hashing is a

technique that lets you smoothly handle these problems:

 Given a resource key and a list of servers, how do you find a primary, second, tertiary (and on down

the line) server for the resource?

 If you have different size servers, how do you assign each of them an amount of work that corresponds

to their capacity?

 How do you smoothly add capacity to the system without downtime? Specifically, this means solving

two problems:

o How do you avoid dumping 1/N of the total load on a new server as soon as you turn it on?

o How do you avoid rehashing more existing keys than necessary?

In a nutshell, here is how it works. Imagine a 64-bit space. For bonus points, visualize it as a ring, or a

clock face. Sure, this will make it more complicated when you try to explain it to your boss, but bear

with me:

That part isn’t very complicated.

Now imagine hashing resources into points on the circle. They could be URLs, GUIDs, integer IDs, or

any arbitrary sequence of bytes. Just run them through a good hash function (eg, SHA1) and shave off

everything but 8 bytes. Now, take those freshly minted 64-bit numbers and stick them onto the circle:

Finally, imagine your servers. Imagine that you take your first server and create a string by appending

the number 1 to its IP. Let’s call that string IP1-1. Next, imagine you have a second server that has

twice as much memory as server 1. Start with server #2’s IP, and create 2 strings from it by appending

1 for the first one and 2 for the second one. Call those strings IP2-1 and IP2-2. Finally, imagine you

have a third server that is exactly the same as your first server, and create the string IP3-1. Now, take

all those strings, hash them into 64-bit numbers, and stick them on the circle with your resources:

Can you see where this is headed? You have just solved the problem of which server to use for

resource A. You start where resource A is and head clockwise on the ring until you hit a server. If that

server is down, you go to the next one, and so on and so forth. In practice, you’ll want to use more

than 1 or 2 points for each server, but I’ll leave those details as an exercise for you, dear reader.

Now, allow me to use bullet points to explain how cool this is:
 Assuming you’ve used a lot more than 1 point per server, when one server goes down, every other

server will get a share of the new load. In the case above, imagine what happens when server #2 goes

down. Resource A shifts to server #1, and resource B shifts to server #3 (Note that this won’t help if all

of your servers are already at 100% capacity. Call your VC and ask for more funding).

 You can tune the amount of load you send to each server based on that server’s capacity. Imagine this

spatially – more points for a server means it covers more of the ring and is more likely to get more

resources assigned to it.

You could have a process try to tune this load dynamically, but be aware that you’ll be stepping close

to problems that control theory was built to solve. Control theory is more complicated than consistent

hashing.

 If you store your server list in a database (2 columns: IP address and number of points), you can bring

servers online slowly by gradually increasing the number of points they use. This is particularly

important for services that are disk bound and need time for the kernel to fill up its caches. This is one

way to deal with the datacenter variant of the Thundering Herd Problem.

Here I go again with the control theory — you could do this automatically. But adding capacity usually

happens so rarely that just having somebody sitting there watching top and running SQL updates is

probably fine. Of course, EC2 changes everything, so maybe you’ll be hitting the books after all.

 If you are really clever, when everything is running smoothly you can go ahead and pay the cost of

storing items on both their primary and secondary cache servers. That way, when one server goes

down, you’ve probably got a backup cache ready to go.

Pretty cool, eh?

I want to hammer on point #4 for a minute. If you are building a big system, you really need to

consider what happens when machines fail. If the answer is “we crush the databases,” congratulations:

you will get to observe a cascading failure. I love this stuff, so hearing about cascading failures makes

me smile. But it won’t have the same effect on your users.

Finally, you may not know this, but you use consistent hashing every time you put something in your

cart at Amazon.com. Their massively scalable data store, Dynamo, uses this technique. Or if you use

Last.fm, you’ve used a great combination: consistent hashing + memcached. They were kind enough

to release their changes, so if you are using memcached, you can just use their code without dealing

with these messy details. But keep in mind that there are more applications to this idea than just

simple caching. Consistent hashing is a powerful idea for anyone building services that have to scale

across a group of computers.

A few more links:

 Another blog post about consistent hashing

 The original paper

45 Responses to “Programmer’s Toolbox Part 3: Consistent
Hashing”
 Peter T - webshop
March 17, 2008 at 8:59 pm

Would sticky sessions enabled on a load balancer help with the caching issue “If you want your cache

to use multiple servers (scale horizontally, in the biz), you’ll need some way of picking the right server

for a particular key.”? Good article by the way…thanks.

 localhost
March 18, 2008 at 12:45 am

Interesting and informative article, thanks!

 Working Notes on Consistent Hashing - Laughing Meme

Pingback on Mar 19th, 2008 at 8:51 am

 Consistent Hashing at Gea-Suan Lin’s BLOG

Pingback on Mar 21st, 2008 at 12:43 am

 Al
March 23, 2008 at 3:07 am

Peter,

Sticking a users session to a server, web, application or other, isn’t going to help in determining what

particular server within your cache cluster has the key you want.

Al.

 Link Box « handthrow

Pingback on Mar 26th, 2008 at 6:51 am

 Photo Matt » Consistent Hashing

Pingback on Mar 30th, 2008 at 4:52 pm

 How To Split Randomly But Unevenly - PHP Code For Load UNBalancing (Utopia
Mechanicus)
Pingback on Apr 3rd, 2008 at 1:52 am

 How To Share The Load (Off-Topic) (ActiveBlogging)

Pingback on Apr 3rd, 2008 at 4:11 am

 David
April 7, 2008 at 11:24 am

Thanks for sharing important logic like this that most people don’t think of until they are in a big mess.

 Mike Zintel
April 15, 2008 at 9:34 pm

MD5 isn’t secure.

 Paul Annesley
April 21, 2008 at 5:20 pm

Thanks for great the article Tom.

Your clear explanation and illustrations inspired me to write an open source implementation for PHP,

as I couldn’t see anything decent around that fit the bill. I’ve put it on Google Code

at http://code.google.com/p/flexihash/ in case anybody needs it.

 Steven Gravell
May 27, 2008 at 12:33 pm

Oh, there’s a java version in there too along with the libketama C library that is used for the PHP

extension

 Flexihash » Blog Archive » Flexihash - Consistent Hashing for PHP

Pingback on Jun 24th, 2008 at 6:17 am

 User Primary » Blog Archive » Friday Links

Pingback on Oct 3rd, 2008 at 8:06 am

 Live Mesh : Behind Live Mesh: The Pub-Sub System

Pingback on Oct 8th, 2008 at 7:34 pm

 Memcached and Mogile Form MemcacheMegaZord! | FewBar.com - Make it good

Pingback on Dec 14th, 2008 at 9:21 am

 -= Linkage 2007.02.18 =-
Pingback on Jan 26th, 2009 at 7:37 am

 Bookmarks for April 15th through April 16th • Passing Curiosity

Pingback on Apr 15th, 2009 at 10:23 pm

 Shane K Johnson » Blog Archive » How I learned to say ‘No’ to SQL

Pingback on Sep 30th, 2009 at 6:23 am

 Consistent Hashing?? | ?????????

Pingback on Oct 2nd, 2009 at 12:27 am

 Sean Neakums (sneakums) 's status on Sunday, 18-Oct-09 09:25:42 UTC -

Identi.ca
Pingback on Oct 18th, 2009 at 1:25 am

 OCTO talks ! » Consistent Hashing ou l’art de distribuer les données

Pingback on Nov 6th, 2009 at 10:09 am

 Episode 2: A brief introduction to NoSQL databases – Hacker Medley

Pingback on Jan 20th, 2010 at 5:29 pm

 diego sevilla’s weblog » Más de bases de datos NoSQL: Consistent Hashing

Pingback on Mar 1st, 2010 at 4:57 am

 Ross
March 26, 2010 at 3:44 am

Hej Tom,

I got half excited about your article… hits most of the squares… However your circle shows the servers

nicely equadistant on the 360 but if we leave their placement to the hash function then they could

actually all appear in a very accute part of the circle so that the services or resources could mostly be

directed unneavenly to one server….

Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
100 Recipes for Programming Java
From Everand
100 Recipes for Programming Java
Jamie Munro
4.5/5 (2)
Pseudo Code
No ratings yet
Pseudo Code
138 pages
(English) System Design Interview - Distributed Cache (DownSub - Com)
No ratings yet
(English) System Design Interview - Distributed Cache (DownSub - Com)
14 pages
Memento
No ratings yet
Memento
16 pages
Different Ways of Caching and Maintaining Cache Consistency
No ratings yet
Different Ways of Caching and Maintaining Cache Consistency
10 pages
Binomial
No ratings yet
Binomial
14 pages
Weeks 10, 11 - Sessions 19, 20, 21, 22 - Chapter HashTables
No ratings yet
Weeks 10, 11 - Sessions 19, 20, 21, 22 - Chapter HashTables
90 pages
14 HashTable
No ratings yet
14 HashTable
38 pages
6 Dec. 24 Unit 5 DSA
No ratings yet
6 Dec. 24 Unit 5 DSA
56 pages
Unit V
No ratings yet
Unit V
14 pages
As-a-Service Business Models
No ratings yet
As-a-Service Business Models
69 pages
Unit 6
No ratings yet
Unit 6
38 pages
DSimp 2
No ratings yet
DSimp 2
21 pages
Presentation 7
No ratings yet
Presentation 7
16 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Hash Tables Slides
No ratings yet
Hash Tables Slides
110 pages
15-440 Distributed Systems: Hashing and Cdns
No ratings yet
15-440 Distributed Systems: Hashing and Cdns
38 pages
Imaster NCE V100R020C10 SOAP NBI User Guide (ODN Visualizer) 06
100% (1)
Imaster NCE V100R020C10 SOAP NBI User Guide (ODN Visualizer) 06
131 pages
28 Consistent Hashing
No ratings yet
28 Consistent Hashing
6 pages
User Manual - Clustering of Secretariats - 28022025
No ratings yet
User Manual - Clustering of Secretariats - 28022025
10 pages
Unit-3 Hashing Storage Btree
No ratings yet
Unit-3 Hashing Storage Btree
26 pages
ITK Programming-Create Dataset
No ratings yet
ITK Programming-Create Dataset
1 page
MFL69464704 (1606-Rev02)
No ratings yet
MFL69464704 (1606-Rev02)
55 pages
Crawler
No ratings yet
Crawler
83 pages
ds-5 Removed
No ratings yet
ds-5 Removed
16 pages
T06 - Time and Space Tradeoffs
No ratings yet
T06 - Time and Space Tradeoffs
70 pages
Hash
No ratings yet
Hash
3 pages
Unit5 Lect5 Hashing
No ratings yet
Unit5 Lect5 Hashing
20 pages
DSAD Dynamic Hashing
No ratings yet
DSAD Dynamic Hashing
79 pages
Consistent Hashihhhhng Explained
No ratings yet
Consistent Hashihhhhng Explained
28 pages
Radio Frequency Modulation Made Easy Saleh Faruque 2024 Scribd Download
100% (10)
Radio Frequency Modulation Made Easy Saleh Faruque 2024 Scribd Download
62 pages
Nokia Unicast Guide
No ratings yet
Nokia Unicast Guide
282 pages
MapReduce Quora
No ratings yet
MapReduce Quora
39 pages
06 - HashMap & HashSet and How Do They Internally Work - What Is A Hashing Function - 800+ Big Data & Java Interview FAQs
No ratings yet
06 - HashMap & HashSet and How Do They Internally Work - What Is A Hashing Function - 800+ Big Data & Java Interview FAQs
7 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Chap 12. Extendible Hashing: File Structures
No ratings yet
Chap 12. Extendible Hashing: File Structures
40 pages
Hashing: Hash Functions Collision Resolution Applications
No ratings yet
Hashing: Hash Functions Collision Resolution Applications
50 pages
Age, Gender and Emotion Detection
No ratings yet
Age, Gender and Emotion Detection
38 pages
Unit 4-Hashing
No ratings yet
Unit 4-Hashing
24 pages
Hashing
No ratings yet
Hashing
16 pages
Suma y Resta de Polinomios PDF
No ratings yet
Suma y Resta de Polinomios PDF
49 pages
BCSE302L-Database Systems Module - 4 Part2
No ratings yet
BCSE302L-Database Systems Module - 4 Part2
71 pages
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
No ratings yet
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
43 pages
Data Structure Unit II
No ratings yet
Data Structure Unit II
25 pages
Consistent Hashing - Explanation and Implementation
No ratings yet
Consistent Hashing - Explanation and Implementation
7 pages
06 Design Consistent Hashing
No ratings yet
06 Design Consistent Hashing
13 pages
Consistent Hashing
No ratings yet
Consistent Hashing
13 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
My Implementation On Github in Memory Version: Hashing, A Refresher
No ratings yet
My Implementation On Github in Memory Version: Hashing, A Refresher
25 pages
Hashing
No ratings yet
Hashing
4 pages
Kang Benchmarking Self-Supervised Learning On Diverse Pathology Datasets CVPR 2023 Paper
No ratings yet
Kang Benchmarking Self-Supervised Learning On Diverse Pathology Datasets CVPR 2023 Paper
11 pages
210 Maps PDF
No ratings yet
210 Maps PDF
39 pages
Lec14 Dhts
No ratings yet
Lec14 Dhts
22 pages
Subhash CV
No ratings yet
Subhash CV
6 pages
Shreya, Siya Seminar Batch B, 4ce3
No ratings yet
Shreya, Siya Seminar Batch B, 4ce3
20 pages
IT 304 OOPM Unit III - 1693892203
No ratings yet
IT 304 OOPM Unit III - 1693892203
10 pages
Maps
No ratings yet
Maps
36 pages
Unit Iv Implementation Techniques
No ratings yet
Unit Iv Implementation Techniques
91 pages
Implementing Hashing in Java
No ratings yet
Implementing Hashing in Java
23 pages
Mod 5
No ratings yet
Mod 5
13 pages
CS301 Lec41
No ratings yet
CS301 Lec41
18 pages
Garbage Collector
No ratings yet
Garbage Collector
10 pages
First Five Candles of 15 Min HH AND LL (9 - 15 Am To 10 - 30 Am) - Check After 10 - 30 Am, Technical Analysis Scanner
No ratings yet
First Five Candles of 15 Min HH AND LL (9 - 15 Am To 10 - 30 Am) - Check After 10 - 30 Am, Technical Analysis Scanner
3 pages
Questions and Answers in SQL
No ratings yet
Questions and Answers in SQL
24 pages
Presentation 1 PPT Computer Club
No ratings yet
Presentation 1 PPT Computer Club
23 pages
Riviera RLCD 32paf
No ratings yet
Riviera RLCD 32paf
45 pages
Hashing
No ratings yet
Hashing
27 pages
PM Chapter 7
No ratings yet
PM Chapter 7
37 pages
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
No ratings yet
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
15 pages
Data Science
No ratings yet
Data Science
8 pages
PR File Ompany
No ratings yet
PR File Ompany
15 pages
A Fast, Minimal Memory, Consistent Hash Algorithm
No ratings yet
A Fast, Minimal Memory, Consistent Hash Algorithm
12 pages
Inferences Example
No ratings yet
Inferences Example
4 pages
Lesson-11-Hash Table
No ratings yet
Lesson-11-Hash Table
3 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Big Data English Terjemahan
No ratings yet
Big Data English Terjemahan
24 pages
Law of Sines and Cosines2
No ratings yet
Law of Sines and Cosines2
31 pages
Face Book Chat
No ratings yet
Face Book Chat
3 pages
MGS 655: Distributed Computing: Consistent Hashing
No ratings yet
MGS 655: Distributed Computing: Consistent Hashing
8 pages
Hashing - : Value Key
No ratings yet
Hashing - : Value Key
11 pages
SASMO Grade 9 (Secondary 3) Sample Questions
No ratings yet
SASMO Grade 9 (Secondary 3) Sample Questions
5 pages
Lab 01
100% (1)
Lab 01
6 pages
Hash PDF
No ratings yet
Hash PDF
7 pages
Summary On Consistent Hashing and Random Trees: Distributed Caching Protocols For Relieving Hot Spots On The World Wide Web
No ratings yet
Summary On Consistent Hashing and Random Trees: Distributed Caching Protocols For Relieving Hot Spots On The World Wide Web
1 page
Memcached & Consistent Hashing: Distributing Keys & Values
No ratings yet
Memcached & Consistent Hashing: Distributing Keys & Values
3 pages
74LS109
No ratings yet
74LS109
5 pages
IOT Applications: UNIT-5
0% (1)
IOT Applications: UNIT-5
25 pages
Consistent Hashing
No ratings yet
Consistent Hashing
2 pages
Guess Paper - 2014 Class - Xi Subject - Computer Science: Other Educational Portals
No ratings yet
Guess Paper - 2014 Class - Xi Subject - Computer Science: Other Educational Portals
3 pages
Star Trek Supplemental Manual
No ratings yet
Star Trek Supplemental Manual
8 pages
AR-727iV3: Part NO. AR-727i V3 AR-727CM V3
No ratings yet
AR-727iV3: Part NO. AR-727i V3 AR-727CM V3
4 pages
Effectiveness of Mandatory Reserve Officer
No ratings yet
Effectiveness of Mandatory Reserve Officer
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Consistent Hashing

Uploaded by

Consistent Hashing

Uploaded by

Consistent Hashing

I've bumped into consistent hashing a couple of times lately. The

The need for consistent hashing arose from limitations experienced

public class ConsistentHash<T> {

private final HashFunction hashFunction;

public ConsistentHash(HashFunction hashFunction,

for (T node : nodes) {

public void add(T node) {

public void remove(T node) {

public T get(Object key) {

Tom White said...

The caching system is strictly following the Map (and SortedMap)

The impl I'm working on now is a cache/persister which uses HDFS

And of course all caches must be searchable = My own

Then it is up to the client to check if the previously stored node is

An extension to this cache would be to return a Set which size is

If a value is stored on two or more nodes then the possibilty for

Ahhh enough already I will implement something like this :)

John Cormie said...

How likely are collisions? If we model hashing as a random

It consistently reports that nearly all nodes should be moved ... am

Implement LRU Cache

public interface LruPageCache {

/** Returns the page number and update cache accordingly.

We need a data structure to check whether a page number is in cache in constant

Each time when a new page number comes in,

if (map.containsKey(page)) { // cache hit

/** Remove the given node from the linked list. */

private void print() {

/** Doubly Linked list */

public static void main(String[] args) {

Posted 1st September 2013 by Sophie

Programmer’s Toolbox Part 3: Consistent

laugh. One day.

technique that lets you smoothly handle these problems:

the line) server for the resource?

o How do you avoid rehashing more existing keys than necessary?

That part isn’t very complicated.

resources assigned to it.

down, you’ve probably got a backup cache ready to go.

Pretty cool, eh?

me smile. But it won’t have the same effect on your users.

across a group of computers.

A few more links:

 Another blog post about consistent hashing

 The original paper

for a particular key.”? Good article by the way…thanks.

Interesting and informative article, thanks!

 Working Notes on Consistent Hashing - Laughing Meme

 Consistent Hashing at Gea-Suan Lin’s BLOG

 Link Box « handthrow

 Photo Matt » Consistent Hashing

 How To Share The Load (Off-Topic) (ActiveBlogging)

MD5 isn’t secure.

Thanks for great the article Tom.

at http://code.google.com/p/flexihash/ in case anybody needs it.

 Flexihash » Blog Archive » Flexihash - Consistent Hashing for PHP

 User Primary » Blog Archive » Friday Links

 Live Mesh : Behind Live Mesh: The Pub-Sub System

 Memcached and Mogile Form MemcacheMegaZord! | FewBar.com - Make it good

 Bookmarks for April 15th through April 16th • Passing Curiosity

 Shane K Johnson » Blog Archive » How I learned to say ‘No’ to SQL

 Consistent Hashing?? | ?????????

 Sean Neakums (sneakums) 's status on Sunday, 18-Oct-09 09:25:42 UTC -

 OCTO talks ! » Consistent Hashing ou l’art de distribuer les données

 Episode 2: A brief introduction to NoSQL databases – Hacker Medley

 diego sevilla’s weblog » Más de bases de datos NoSQL: Consistent Hashing

directed unneavenly to one server….

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.