[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue EQUALP-GENERIC



> Date: Tue, 7 Mar 89 20:20 EST
> From: David A. Moon <Moon@STONY-BROOK.SCRC.Symbolics.COM>

> CLtL says that objects that have components are only EQUALP if the two
> objects are of the same type.  Perhaps this vague phrase means that CLASS-OF
> returns the same (EQ) class for both of them.  Who enforces this?  And who
> enforces that numbers are an exception to this rule?  Does every method
> enforce this, or is there some kind of shared code?

Good point.  Offhand, I can't think of a good way to share the code via any of
the standard method-combination mechanisms.  Semi-minor point: arrays are
another exception to this rule.  I say semi-minor, because that means we have
two exceptions, which starts making the rule seem somewhat feeble.

> I couldn't figure out how use-pointer is to be used.  Are we saying anything
> about what value a hash table supplies for this argument?  Up at the top you
> seemed to imply that hash tables supply T.  Then what use is the NIL value?

A NIL argument is intended to allow users to use this function in much the same
way they presently use SXHASH.  Since users don't have (portable) access to the
mechanisms by which it is possible to tell whether a hash-code has been
invalidated by some state change in the system, for them to be able to use this
function directly they need to be able to turn off any dependencies on such
state changes.

> The depth thing is a new feature since CLtL p.81 says EQUALP is permitted to
> be non-terminating if given circular arguments. 

Yes, but ... I'll have more to say about this at the end of this message.

> The second value is not well enough specified.  ... You haven't said anything
> about how to combine the second value.  You may be assuming that the second
> value is T or NIL and the appropriate combining function is OR, but you
> haven't said that. 

Yes, that is what I intended, and you're right that I should have been more
explicit about it.  However ...

> In fact a simple binary flag isn't really good enough, because many
> implementations have multiple levels of memory (volatility, ephemeralness,
> generations) and efficiency dictates that the second value indicate which
> level was depended upon. 

Absolutely right.  I had even thought about this when I was analysing the
problem.  Unfortunately, I forgot about it when I was writing the proposal, and
when I dug out the manuals I have for other Lisp implementations to check my
work against outside sources, none showed up this problem.

> Possibly it would work to require the second value to be an integer and use
> MAX as the combining function.  Or perhaps there should be a specific
> combining function for this purpose with an implementation-dependent
> behavior.  Even that isn't good enough, because a user who writes an
> equalp-hash method that actually computes a hash code, without recursing into
> components, has to know what to return as the second value.  We could say
> that NIL or 0 or some named constant means the first value does not depend on
> any memory locations.

Using MAX isn't right either.  Consider an implementation in which memory is
broken up into regions which are collected seperately (no generations, just
smaller regions to be dealt with on any one gc).  In such an implementation you
might want to return an integer in which bit positions are associated with
regions, and the appropriate combining function is LOGIOR.  So I think it
pretty much has to be an implementation-specific function.  A named constant
seems the right way to go for user-computed hash-codes.

> Now wait a minute, I think being independent of the core image and being
> independent of objects changing their address are two different issues.
> There can be other things besides memory addresses that are dynamically
> allocated on a per-core-image basis.  Character codes in systems with
> user-defined character registries are a classic example.

Right again.  I forgot that things like character codes might change in strange
ways that have nothing to do with memory location.  I think use-pointer should
probably mean can't change within the implementation, for compatibility with
SXHASH. 

> It's pretty unclear what the EQUALP-HASH method on T would do with a second
> argument of NIL.  What else can it use but the pointer (and perhaps the
> class)?  I don't see what it can do but return a constant value for all
> objects.  Maybe it would be better off signalling an error.  Or maybe this is
> another argument against the use-pointer feature.

No, because of the desire to provide functionality similar to SXHASH.

A problem which you didn't comment on (probably because others already have) is
the restriction I included on adding and removing methods.  I've pretty much
convinced myself that this could be removed with the addition of a minor amount
of machinery.  Specifically, add a function which can be used to determine
whether the definitions of EQUALP or EQUALP-HASH have changed since the last
time you checked.  (I'm not going to try to specify it precisely just now.
When I write another version of the proposal, I'll do so then.) 

Regarding the depth argument to EQUALP-HASH.  I keep waffling on this.  My
concern is that if the object to be hashed is circular, and all the methods
that are going to be invoked on the circular parts will continue walking the
circular structure, then EQUALP-HASH will hang.  The reason this is potentially
a problem, even though EQUALP is allowed to hang in such circumstances, is that
EQUALP might legitimately be optimized to do an EQL test 'up front'.  Thus,
without the depth argument you could potentially try to hash some object and be
unable to, despite the fact that you could find it again if you knew where to
look, because the EQL optimization would pick it up.

Basically, the question is, do the semantics of EQUALP (and EQUAL) include the
idea of doing an EQL test up front or not.  If the EQL test is an
'optimization', then it is arguably bogus.  However, I suspect that most people
would expect it to be legit, even though the descriptions in CLtL (which don't
say anything it) should hang when comparing any descended circular structure to
itself (and most people would consider the latter behavior rather bizarre).
Needless to say, this situation seems to be bugging me (probably excessively).

On a more general note, I agree with you that finishing this job may be hard,
and may not be possible under the present circumstances.  I had two reasons for
going to the trouble. 

First, there seemed to me to be a significant number of people who were
interested in seeing such a proposal at least attempted.  To paraphrase an
argument that Kent Pitman sometimes uses, better to get a proposal out where
people can actually look at it and judge it on its merits, even if the
consensus ends up being to punt because its too hard or ill-defined a problem,
than to pocket veto.  That way we at least have a record that shows that we
gave the problem serious consideration when somebody later complains about what
we've done.

Secondly, I'm really much more interested in the possiblity of specifying how
to do hash-tables with general user-specified test and hash functions.  I used
this proposal as a concrete example to think about, and I feel like I'm much
closer to a good solution than I would have been just trying to think about the
problem abstractly.  Even if we can't get such a thing into the standard due to
time constraints, if we or anyone else can come up with such a specification
that is widely acceptable, I think a lot of people would be made happier.

So I'll keep working on it, and hopefully you and others will continue to
provide such useful comments.  Thanks.

kab
-------