[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SXHASH



I was under the impression that MACSYMA kept files on disk that contained
SXHASH values, and that this was the main reason for imposing the rule
that the implementation of SXHASH not change.  If this is not or is no longer
true, then there is no reason not to "fix" SXHASH, other than the possibly
considerable initial inconveniences you noted with regard to FASL files.
Rotating the CDR instead of the CAR would help.  If you assume the ROT
merely divides the hash value by 2 (which it tends to do for leaves
of the tree being hashed which are symbols with short print names --
a ROT by 1 would be better for this, but less good for fixnums),
and if you assume the addition never overflows (these are pessimal
assumptions), then two leaves are currently (resp. under proposed change)
distinguished iff the paths from them to the root follow
the same number of CAR (resp. CDR) chains, *regardless* of the
number of CDR (resp. CAR) chains.  Now in a list, every leaf
except the terminating () is one CAR chain away from the root
(and some number of CDRs).
It would seem better to make the "weight" of a leaf at the root depend
on both the number of CARs and number of CDRs.  Therefore I would
propose to rotate both the CAR and the CDR, preferably by amounts
that are relativelylarge and relatively prime.  So I would suggest
a function like (+ (ROT THE-CAR 11.) (RTHE-CDR 7)) or something.

WHat we really want is to find three operators $, %, and #,
the first two being unary and the third binary.  The function
would then be (# ($ THE-CAR) (% THE-CDR)).  WHat properties should
these have?  Ideally, $ and % would not distribute over #,
or if they did then at least $ and % should not commute ($%X shouldn't
equal %$X -- of course, the ROT suggestion above doesn't satisfy this).
The non-commutation property means that the CDAR and CADR of a hashed
tree are distinguished (by which I mean that exchanging those two
subtrees will alter the hash if the subtrees have different hashes).
Perhaps someone would care to pursue a theoretical approach further.