Hash Functions: the modulo prime myth?

Update: new test data and rebuttal of critics here

Hash functions are used all over programming. From cryptography, hash tables, to error checking.

Many hash functions have the following form.

  1. Generate a pseudo random number from data using your magical function.
  2. Force this number into a narrow boundary with modulus. (ie. mod 8 forces all numbers to be from 0 to 7)

One of practices that bothered me was the use of prime numbers to limit the bucket size.

Why should we round down our buckets to the nearest prime? Doesn’t less buckets mean more collisions?

I searched for the answer. But no one knew for sure. Over and over again, they essentially repeated: “because Knuth said so“, or “because primes are naturally more randomy“. So I decided to look again for a more conclusive answer.

Read the rest of this entry »

27 Comments


Computer Language Trends

You probably know about the Tiobe software index.

But does website volume really tell how popular a language is? I bring you charts of languages by job listings.

functional

Here are the functional languges. As you can see here, F# is starting to grow. But Erlang still surpasses all of them by far.

Read the rest of this entry »

2 Comments


When Does Infinity Equal -2?

Remember in school, when your math friend showed you how 1 = 2?
Now you can amuse them too, except it doesn’t involve any math errors like dividing by 0, or the old

If you are a math major point your condescending look elsewhere.

Here we start with S, an infinite sequence. And somehow we get -2.

The math experts told me in an unexcited manner: the explanation is here

http://en.wikipedia.org/wiki/P-adic_number.

Leave a Comment


Morton Codes

Morton codes (or Z-order curves) are a way to hash coordinates in 1 dimension. This is useful for problems such as “which stores are 10 miles away from me?” without searching through all the stores in existence.

Why Not Use A 2d Array?

You might ask, why not just store the information in a 2d array in C or Java? The problem is caching.

      y=0              y=1         y=2            y=3           ....    y=99
[1,2,3,4...99] [1,2,3,4...99] [1,2,3,4...99]  [1,2,3,4...99]    ....

How a 99×99 2d array could be stored in memory.

Read the rest of this entry »

Leave a Comment


A Better Python Reload

Have you ever wanted to rewrite your program while it was running? According to wikipedia, this feature is called live coding.

This is a big deal for applications such as internet servers and the telecom industry. Restarting the program every time you make a change is unnecessary downtime when you would like to have +99.9% uptime.

Some languages were made with this feature in mind, such as Erlang and ColdC. However for Python, it is not as easy.

Read the rest of this entry »

5 Comments