[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue: HASH-TABLE-KEY-MODIFICATION (Version 1)



re:    Proposal (HASH-TABLE-KEY-MODIFICATION:SPECIFY):
       . . . 

I fear this is far too vague to be a cleanup proposal, however well-
motivated it may have been.  Indeed, there was a discussion of this topic 
on Common-Lisp@SU-AI about a year ago (as well as a few related msgs quite 
recently), but the upshot of all the discussion was merely to "enlighten" 
more of the community about the techniques of hashing in general.  I think 
we should limit our efforts in this arena at giving advice to editors 
(such as Steele, and Chapman, etc.) on how to explain the consequences of 
some typical hashing techniques.


First, there is no generally agreed-upon notion of what a "component" is;
so the part where you say:

    In EQ and EQL hash tables, components of a key may be freely modified
    with no effect on the table.
    In EQUAL hash tables, it is an error to modify a component of a key.

isn't saying anything usable; or if it is, then it is something already
understood via channels outside this proscription.  Furthermore, it could
bog us down for months if we tried to specify "component" exactly.   For
example, is the bit-field (byte 3 5) of a fixnum X a "component"  of X?


Second, the "rule":

    If implementations define additional acceptable values for the :TEST
    argument to MAKE-HASH-TABLE, the rule is that it is an error to modify
    a component of a key if and only if that component affects the test
    function.  If that component cannot affect the test function, the hash
    table's operation must not be affected by modifying the component.

is completely inadequate.  You might as well say that it is an error to 
modify the components of list elements for any list that is passed to a 
sequence function with a :test argument of EQUAL.  What you probably meant
to say, in the hash-table context, is:

    ... it is an error to modify an object used as key in a hash table
    iff such modification affects either  (1) the outcome of equivalence  
    test used by the hash-table, or (2) the structure of the collision
    chains built up in the hash table [note that SXHASH, or a substitute 
    therefor, may be involved in the structure of collision chains].

[There might have to be a line saying that the :test component of a hash-
table is always an equivalence relation.]   However, CLtL neither specifies
that hash-tables must use some "collision chain" technique -- alists appear
to be a fully conforming implementation (albeit sloooow) -- nor does is even
specify that any such collision tecnhique must be based on the output of 
SXHASH.  About the most it says is that hashing should be "fast".

Unfortunately, nowhere does the discussion of HASH-TABLE-KEY-MODIFICATION
even mention collision chains.


An implementational note *might* be worth mentioning in the document
standard about how the historical intent of EQ/EQL tables was in fact 
to limit the information obtained from an object to merely the pointer 
itself (i.e., it's address), and that this usually meant faster hashing 
(in MacLisp, for example).  But with the spread of copying GC's (stop-
and-copy, generational, etc) this kind of dependency limitation forces 
some very odd behaviour on the memory-management subsystem, such as the 
need for rehashing after GC's, etc.  I really know of cases where an EQL 
table was grossly slower than an EQUAL table, simply because it forced so 
much more attention from the memory-management subsystem.

Typically, only very "introspective" code needs to know whether, say, two 
strings are EQL rather EQUAL; and very often even experienced users make 
the mistake of thinking that EQ means speed and EQUAL means sloth.
Certainly no one using Common Lisp should prefer an EQ/EQL table over
an EQUAL one, unless he is actually concerned with the implementational
identity of objects [such as in system code doing circularity checks,
or "constants" coalescing, or macromemo-izing, etc.].


I think the gist of these last two paragraphs is something that the 
standards editors could try to work into the next chapter on Hash-Tables
(and possibly something about "collision chains" being a frequent part
of hash-tables).  This would be preferable to having the cleanup sub-
committee try to solve the unsolvable problem of anticipating every 
possible consequence of every possible implementation technique for 
hash-tables.



-- JonL --