Oct 6, 2009

Simple Simhashing

A friend and coworker of mine, Ryan Moulton, just wrote a very nifty article called "Simple Simhashing". If you like software algorithms, it is most certainly worth a read. Simple simhashing is an algorithm that allows you to take any "thing" and come up with a simhash that should be the same for two "things" in probability relative to how similar those to "things" are. The really nifty part is that the algorithm is completely local and the properties are provable in an easy to understand way. This algorithm is actually used at Google in a few places, although I'll leave it to you to guess how.