Ben Laurie blathering

Distributed Hash Tables and the Long Tail

I spent some time with my friend Ben Hyde recently, and we got talking about distributed hash tables, and his favourite topic, power law distributions. Apparently if you are part of, say, a file-sharing network, and you happen to be the node that has the hash for some fantastically popular file, then you suffer a lot of pain: everyone requesting that file has to talk to you to find out where to get it from and this kills your ‘net connection.

So, I had this idea, which was probably not original, but since Ben thought it might work and the blogosphere is the new peer-reviewed journal, here it is.

At each node that is “responsible” for a hash, measure the traffic to that hash (i.e. number of requests). Take the log of the traffic and combine it with the hash, giving a new hash. Then the nodes that serve that hash should be determined by the new hash. The higher the log of the traffic, the more nodes should serve it. When participating nodes detect that the traffic has changed sufficiently, they should (obviously) hand off to the resulting new hash.

To search for a hash in this scheme, clients should start with the highest possible traffic and pick a random node (or two or three) to query that would serve the hash at that traffic level. If this fails, decrease the log and try again.

This should (at the cost of more global load) reduce local load.

It adds some complication, of course, and probably increases the chances of a false negative.

1 Comment

  1. […] I said it probably wasn’t original, and I was right. Beehive from Cornell is a concrete implementation of something very like the technique I described. It’s been used for various interesting projects, including P2P DNS, something that’s made possible, or even plausible, by DNSSEC. […]

    Pingback by Links » Distributed Hash Tables Revisited — 19 Feb 2006 @ 22:48

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress