[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hash table access time

Note: This is a response to Dan Hoey concerning issues
he raised about my submission to SLUG.  I am sending a copy
to SLUG, becasue it clarifies what I said before.


Thanks for your info about hash-tables.

> Was the element a SYMBOL, CONS, BIGNUM, FLONUM, FIXNUM, ...?  It may
> take different times to compute their hash codes.  Were there
> collisions in the hash table?  Was the association list CDR-coded?
> All of these affect the timing; if you want to get serious about
> benchmarking you've got to understand the problem.

I was using SYMBOLs or FIXNUMs as keys, and the results
were about the same for either type.
There were no collisions in the hash table,
and the alists were not cdr-coded in my first test.
I just did the same test with cdr-coded alists,
and the result was that the threshold increased
from 19 to 25.

I did not mean to present the last word in benchmarking.
Instead, I just wanted to point out that the overhead associated
with simple uses of hash tables is quite significant.

> Actually, if you take the average over ALL objects, the chance that
> ASSOC will return non-NIL is vanishingly small; it is necessary to
> search all of the ALIST, plus the time it takes to fall off the end.
> In other words, the average depends on the ``hit ratio'' (proportion of
> successful ASSOC's) which you are taking to be 1.

In my application the hit ratio is high enough that it makes sense
to think of the avg. case as a little more than half of the length
of the alist.

> Well, there's this big numerical operation you have to do up front,
> computing the hash code.  Then there's whatever you had to do to lock
> out GC.  I guess I'm not that surprised.

Perhaps the best suggestion I've received so far to get around
the cost of numerical operations and GC-related overhead with hash tables
is to put a unique serial number inside the structure being hashed,
and simply use that as a hash code.  I haven't tried it yet,
but that seems to make sense if you want speed, and can give up
a little space.

 -- Bob Kasper