[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[David E. Wallace: Re: A few questions on PCL ]

------- Forwarded Message

Return-Path: wallace%hpldew@hplabs.HP.COM
Message-Id: <8812151744.AA09867@hpldew.HP.COM>
To: larus%ginger.Berkeley.EDU%Berkeley.EDU%hplabs@hplabs.HP.COM
Subject: Re: A few questions on PCL 
In-Reply-To: Your message of Wed, 14 Dec 88 09:08:53 -0800.
Date: Thu, 15 Dec 88 09:44:34 PST
From: David E. Wallace <wallace%hpldew@hplabs.HP.COM>

> There is lot of published work on non-chained hashing schemes.
> Unfortunaely, I'm too busy and lazy to look it up.  However, there is
> one scheme that I remember from a compiler class that should work well
> here.  It is called quadratic linear rehashing.  Assume we want a hash
> value between 0 and S.  Let H be the initial hash value computed from
> the object.  Then compute the sequence of hash values:
> 	H_i = (H + i*i) mod S
> If S is prime, then this scheme tends to spread the hash probes over
> the entire table (note the resemblence to pseudo-random number
> generators).  Of course, this is an expensive operation on machines
> without hardware fixnum multiple and divide operations (do people run
> Lisp on such machines?).
> /Jim

It looks like it's something of a moot point for PCL, but why do you need
hardware multiply and divide (except for the initial hashing)?
Use the fact that i*i = (i-1)*(i-1) + 2*i-1, and you can compute the
sequence H_i using only addition, subtraction, and comparison.
In pseudo-C:

probe = H;
incr = 1;

while (hash_table[probe] != EMPTY && hash_table[probe] != key && incr < S)
    probe += incr;
    incr += 2;
    if (probe >= S) probe -= S;

/* Now if hash_table[probe] == key, success, else failure */

Since probe and incr are kept in the range 0...S-1, their sum must fall
in the range 0...2*S-2, so no more than one subtraction is ever needed
to normalize probe.  This version may be faster even on machines with
hardware multiply and divide, if addition and subtraction are sufficiently
faster.  One of the weaknesses of quadratic hashing is that you can
only reach locations offset from your starting point by a quadratic residue
(mod S), so you will only try about half the possible locations in the table
before wrapping around.  The wrap-around corresponds to the point where incr
wraps around S.  The above code uses this condition to terminate the loop,
since no new probes will be generated beyond this point.

Feel free to forward this to the mailing list if you think the
point is of general interest.

Dave W.

------- End of Forwarded Message