[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

btree's



    Date: Mon, 6 Mar 89 16:05 CST
    From: vaughan@MCC.COM (Paul Vaughan)

	    We've got an application that makes heavy use of hash tables,
    but the tables sometimes get very large (> 10000 entries).  This reduces
    performance dramatically because of the paging characteristics.  So, we
    are wondering
	    1) Would it behoove us to use btree tables, rather than hash
    tables.
	    2) Does anyone have a btree implementation that they are willing
    to share.
	    3) Does anyone have any other suggestions for improving
    performance.

The hash table implementation uses double hashing and behaves pretty
well even for large tables.  Tables with tens of thousands of entries
aren't very large, and I wouldn't expect you to have problems unless your
keys are complex and your hash and comparison functions have poor paging
behavior.  Some things you might investigate are:

 - reduce collisions with a hash function tuned for your keys
 - use the :store-hash-code option for tables if your hash or
   comparison functions are complex.  This trades table space for
   reduced compute time and better paging behavior.
 - localize the portions of your data structures needed for
   hashing and comparison
 - even if you can't predict your table sizes in advance, maybe
   you could use the :rehash-size and :rehash-threshold options to
   improve the table system's default mechanisms
 - try using page-in-table and page-out-table at appropriate points

Btrees would eliminate problems with table growing (as would extensible
hash tables), but might not generally improve paging behavior, depending
on your access patterns and the quality of your hash functions.  We'd
like to extend the table software to include these representations, but
I don't think that's likely to happen soon.

Under separate cover I could send you a B*-tree implementation I have
that's adapted from those used internally by the Genera garbage
collector and virtual memory system.  However, it would probably require
a fair amount of work to make it suitable for your use.

	    The tables we make come in all sizes from 5 entries to 20000 entries,
    but I'm not sure of the distributions.  There are probably ways to
    estimate the potential size of the table before build it, but we haven't
    worried about it much.  Right now we just rely on the growing table
    mechanism that Symbolics provides.  Does anyone know if Symbolics has
    any plans to include sophisticated facilities (such as btrees) in the
    (already extremely useful) generalized table facility.

      -- Paul Vaughan