[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Issue: HASH-TABLE-KEY-MODIFICATION (Version 1)
- To: Jon L White <jonl@lucid.com>
- Subject: Issue: HASH-TABLE-KEY-MODIFICATION (Version 1)
- From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>
- Date: Mon, 19 Sep 88 18:39 EDT
- Cc: CL-Cleanup@SAIL.STANFORD.EDU, GLS@Think.com, CL-Editorial@SAIL.STANFORD.EDU
- In-reply-to: <8809192136.AA14744@bhopal>
Date: Mon, 19 Sep 88 14:36:41 PDT
From: Jon L White <jonl@lucid.com>
....
First, there is no generally agreed-upon notion of what a "component" is;
so the part where you say:
In EQ and EQL hash tables, components of a key may be freely modified
with no effect on the table.
In EQUAL hash tables, it is an error to modify a component of a key.
isn't saying anything usable; or if it is, then it is something already
understood via channels outside this proscription.
It's true that "component" is a vague notion. As elaborated on later in
the proposal, "component" should be defined in terms of the test function.
Maybe this initial vague language should just be abandoned.
Furthermore, it could
bog us down for months if we tried to specify "component" exactly. For
example, is the bit-field (byte 3 5) of a fixnum X a "component" of X?
Who cares, since Common Lisp does not provide any way to modify fixnums.
Second, the "rule":
If implementations define additional acceptable values for the :TEST
argument to MAKE-HASH-TABLE, the rule is that it is an error to modify
a component of a key if and only if that component affects the test
function. If that component cannot affect the test function, the hash
table's operation must not be affected by modifying the component.
is completely inadequate. You might as well say that it is an error to
modify the components of list elements for any list that is passed to a
sequence function with a :test argument of EQUAL.
That's precisely what I would have said, if sequence functions were
permitted to create auxiliary data structures that encache information
about their arguments from one call to the next. Since they are not
(although this is only implied by CLtL), there is no need to impose
such a restriction on their callers.
What you probably meant
to say, in the hash-table context, is:
... it is an error to modify an object used as key in a hash table
iff such modification affects either (1) the outcome of equivalence
test used by the hash-table, or (2) the structure of the collision
chains built up in the hash table [note that SXHASH, or a substitute
therefor, may be involved in the structure of collision chains].
[There might have to be a line saying that the :test component of a hash-
table is always an equivalence relation.] However, CLtL neither specifies
that hash-tables must use some "collision chain" technique -- alists appear
to be a fully conforming implementation (albeit sloooow) -- nor does is even
specify that any such collision tecnhique must be based on the output of
SXHASH. About the most it says is that hashing should be "fast".
Unfortunately, nowhere does the discussion of HASH-TABLE-KEY-MODIFICATION
even mention collision chains.
I vehemently disagree with this, and I think you're being completely
wrong-headed. The point of the proposal was to state what are the
requirements on Common Lisp programs so that they will be portable
to all implementations of hash tables. The internal details of hash
tables in some particular implementation are not relevant; furthermore,
discussing them can only mislead users into writing non-portable
programs. What I meant to say was precisely what I did say. I agree
that an editor could find more precise language than "affect the
test function" by which I mean "change the answer returned by the
test function."
An implementational note *might* be worth mentioning in the document
standard about how the historical intent of EQ/EQL tables was in fact
to limit the information obtained from an object to merely the pointer
itself (i.e., it's address), and that this usually meant faster hashing
(in MacLisp, for example). But with the spread of copying GC's (stop-
and-copy, generational, etc) this kind of dependency limitation forces
some very odd behaviour on the memory-management subsystem, such as the
need for rehashing after GC's, etc. I really know of cases where an EQL
table was grossly slower than an EQUAL table, simply because it forced so
much more attention from the memory-management subsystem.
Typically, only very "introspective" code needs to know whether, say, two
strings are EQL rather EQUAL; and very often even experienced users make
the mistake of thinking that EQ means speed and EQUAL means sloth.
Certainly no one using Common Lisp should prefer an EQ/EQL table over
an EQUAL one, unless he is actually concerned with the implementational
identity of objects [such as in system code doing circularity checks,
or "constants" coalescing, or macromemo-izing, etc.].
I think the gist of these last two paragraphs is something that the
standards editors could try to work into the next chapter on Hash-Tables
(and possibly something about "collision chains" being a frequent part
of hash-tables).
I agree that it might be interesting to have a book that discusses the
performance tradeoffs in hash tables. I don't think that has anything to
do with the issue at hand, which is to define what characteristics of hash
tables may or may not be assumed by portable programs.
This would be preferable to having the cleanup sub-
committee try to solve the unsolvable problem of anticipating every
possible consequence of every possible implementation technique for
hash-tables.
If you thought that's what the HASH-TABLE-KEY-MODIFICATION issue was
about, you misunderstood it. It's about defining whether portable
programs may or may not perform side effects on objects used as
keys of hash tables, and which side-effects can be performed. I
can see it needs to be written up in a form that is less easy
to misunderstand. If I have time, which is not too likely, I'll
do that, otherwise I'll just forget it.