[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue: HASH-TABLE-KEY-MODIFICATION (Version 1)



    Date: Mon, 19 Sep 88 14:36:41 PDT
    From: Jon L White <jonl@lucid.com>

    ....
    First, there is no generally agreed-upon notion of what a "component" is;
    so the part where you say:

	In EQ and EQL hash tables, components of a key may be freely modified
	with no effect on the table.
	In EQUAL hash tables, it is an error to modify a component of a key.

    isn't saying anything usable; or if it is, then it is something already
    understood via channels outside this proscription.  

It's true that "component" is a vague notion.  As elaborated on later in
the proposal, "component" should be defined in terms of the test function.
Maybe this initial vague language should just be abandoned.

							Furthermore, it could
    bog us down for months if we tried to specify "component" exactly.   For
    example, is the bit-field (byte 3 5) of a fixnum X a "component"  of X?

Who cares, since Common Lisp does not provide any way to modify fixnums.

    Second, the "rule":

	If implementations define additional acceptable values for the :TEST
	argument to MAKE-HASH-TABLE, the rule is that it is an error to modify
	a component of a key if and only if that component affects the test
	function.  If that component cannot affect the test function, the hash
	table's operation must not be affected by modifying the component.

    is completely inadequate.  You might as well say that it is an error to 
    modify the components of list elements for any list that is passed to a 
    sequence function with a :test argument of EQUAL.

That's precisely what I would have said, if sequence functions were
permitted to create auxiliary data structures that encache information
about their arguments from one call to the next.  Since they are not
(although this is only implied by CLtL), there is no need to impose
such a restriction on their callers.

    What you probably meant
    to say, in the hash-table context, is:

	... it is an error to modify an object used as key in a hash table
	iff such modification affects either  (1) the outcome of equivalence  
	test used by the hash-table, or (2) the structure of the collision
	chains built up in the hash table [note that SXHASH, or a substitute 
	therefor, may be involved in the structure of collision chains].

    [There might have to be a line saying that the :test component of a hash-
    table is always an equivalence relation.]   However, CLtL neither specifies
    that hash-tables must use some "collision chain" technique -- alists appear
    to be a fully conforming implementation (albeit sloooow) -- nor does is even
    specify that any such collision tecnhique must be based on the output of 
    SXHASH.  About the most it says is that hashing should be "fast".

    Unfortunately, nowhere does the discussion of HASH-TABLE-KEY-MODIFICATION
    even mention collision chains.

I vehemently disagree with this, and I think you're being completely 
wrong-headed.  The point of the proposal was to state what are the
requirements on Common Lisp programs so that they will be portable
to all implementations of hash tables.  The internal details of hash
tables in some particular implementation are not relevant; furthermore,
discussing them can only mislead users into writing non-portable
programs.  What I meant to say was precisely what I did say.  I agree
that an editor could find more precise language than "affect the
test function" by which I mean "change the answer returned by the
test function."

    An implementational note *might* be worth mentioning in the document
    standard about how the historical intent of EQ/EQL tables was in fact 
    to limit the information obtained from an object to merely the pointer 
    itself (i.e., it's address), and that this usually meant faster hashing 
    (in MacLisp, for example).  But with the spread of copying GC's (stop-
    and-copy, generational, etc) this kind of dependency limitation forces 
    some very odd behaviour on the memory-management subsystem, such as the 
    need for rehashing after GC's, etc.  I really know of cases where an EQL 
    table was grossly slower than an EQUAL table, simply because it forced so 
    much more attention from the memory-management subsystem.

    Typically, only very "introspective" code needs to know whether, say, two 
    strings are EQL rather EQUAL; and very often even experienced users make 
    the mistake of thinking that EQ means speed and EQUAL means sloth.
    Certainly no one using Common Lisp should prefer an EQ/EQL table over
    an EQUAL one, unless he is actually concerned with the implementational
    identity of objects [such as in system code doing circularity checks,
    or "constants" coalescing, or macromemo-izing, etc.].


    I think the gist of these last two paragraphs is something that the 
    standards editors could try to work into the next chapter on Hash-Tables
    (and possibly something about "collision chains" being a frequent part
    of hash-tables).

I agree that it might be interesting to have a book that discusses the
performance tradeoffs in hash tables.  I don't think that has anything to
do with the issue at hand, which is to define what characteristics of hash
tables may or may not be assumed by portable programs.

    This would be preferable to having the cleanup sub-
    committee try to solve the unsolvable problem of anticipating every 
    possible consequence of every possible implementation technique for 
    hash-tables.

If you thought that's what the HASH-TABLE-KEY-MODIFICATION issue was
about, you misunderstood it.  It's about defining whether portable
programs may or may not perform side effects on objects used as
keys of hash tables, and which side-effects can be performed.  I
can see it needs to be written up in a form that is less easy
to misunderstand.  If I have time, which is not too likely, I'll
do that, otherwise I'll just forget it.