[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Issue: HASH-TABLE-KEY-MODIFICATION (Version 1)
- To: Moon@STONY-BROOK.SCRC.Symbolics.COM
- Subject: Issue: HASH-TABLE-KEY-MODIFICATION (Version 1)
- From: Jon L White <email@example.com>
- Date: Mon, 19 Sep 88 14:36:41 PDT
- Cc: CL-Cleanup@SAIL.STANFORD.EDU, GLS@Think.com, CL-Editorial@SAIL.STANFORD.EDU
- In-reply-to: David A. Moon's message of Mon, 12 Sep 88 20:25 EDT <19880913002519.8.MOON@EUPHRATES.SCRC.Symbolics.COM>
re: Proposal (HASH-TABLE-KEY-MODIFICATION:SPECIFY):
. . .
I fear this is far too vague to be a cleanup proposal, however well-
motivated it may have been. Indeed, there was a discussion of this topic
on Common-Lisp@SU-AI about a year ago (as well as a few related msgs quite
recently), but the upshot of all the discussion was merely to "enlighten"
more of the community about the techniques of hashing in general. I think
we should limit our efforts in this arena at giving advice to editors
(such as Steele, and Chapman, etc.) on how to explain the consequences of
some typical hashing techniques.
First, there is no generally agreed-upon notion of what a "component" is;
so the part where you say:
In EQ and EQL hash tables, components of a key may be freely modified
with no effect on the table.
In EQUAL hash tables, it is an error to modify a component of a key.
isn't saying anything usable; or if it is, then it is something already
understood via channels outside this proscription. Furthermore, it could
bog us down for months if we tried to specify "component" exactly. For
example, is the bit-field (byte 3 5) of a fixnum X a "component" of X?
Second, the "rule":
If implementations define additional acceptable values for the :TEST
argument to MAKE-HASH-TABLE, the rule is that it is an error to modify
a component of a key if and only if that component affects the test
function. If that component cannot affect the test function, the hash
table's operation must not be affected by modifying the component.
is completely inadequate. You might as well say that it is an error to
modify the components of list elements for any list that is passed to a
sequence function with a :test argument of EQUAL. What you probably meant
to say, in the hash-table context, is:
... it is an error to modify an object used as key in a hash table
iff such modification affects either (1) the outcome of equivalence
test used by the hash-table, or (2) the structure of the collision
chains built up in the hash table [note that SXHASH, or a substitute
therefor, may be involved in the structure of collision chains].
[There might have to be a line saying that the :test component of a hash-
table is always an equivalence relation.] However, CLtL neither specifies
that hash-tables must use some "collision chain" technique -- alists appear
to be a fully conforming implementation (albeit sloooow) -- nor does is even
specify that any such collision tecnhique must be based on the output of
SXHASH. About the most it says is that hashing should be "fast".
Unfortunately, nowhere does the discussion of HASH-TABLE-KEY-MODIFICATION
even mention collision chains.
An implementational note *might* be worth mentioning in the document
standard about how the historical intent of EQ/EQL tables was in fact
to limit the information obtained from an object to merely the pointer
itself (i.e., it's address), and that this usually meant faster hashing
(in MacLisp, for example). But with the spread of copying GC's (stop-
and-copy, generational, etc) this kind of dependency limitation forces
some very odd behaviour on the memory-management subsystem, such as the
need for rehashing after GC's, etc. I really know of cases where an EQL
table was grossly slower than an EQUAL table, simply because it forced so
much more attention from the memory-management subsystem.
Typically, only very "introspective" code needs to know whether, say, two
strings are EQL rather EQUAL; and very often even experienced users make
the mistake of thinking that EQ means speed and EQUAL means sloth.
Certainly no one using Common Lisp should prefer an EQ/EQL table over
an EQUAL one, unless he is actually concerned with the implementational
identity of objects [such as in system code doing circularity checks,
or "constants" coalescing, or macromemo-izing, etc.].
I think the gist of these last two paragraphs is something that the
standards editors could try to work into the next chapter on Hash-Tables
(and possibly something about "collision chains" being a frequent part
of hash-tables). This would be preferable to having the cleanup sub-
committee try to solve the unsolvable problem of anticipating every
possible consequence of every possible implementation technique for
-- JonL --