[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

union performance bug in symbolics CL

    Date: Fri, 16 Aug 1991 15:50 EDT
    From: tk@AI.research.att.com (Thomas Kirk)

    the implementation of union in symbolics CL has a nasty performance bug.
    i haven't analyzed the algorithm, but it appears that the number of 
    pairwise comparisons done in this implementation of union is quadratic 
    in terms of the lengths of the arguments; the length of the second argument 
    is in the square term. 

It has a worst-case behavior of L2(L1+(L2-1)/2) comparisons, where L1
and L2 are the lengths of the first and second arguments.  Worst case is
when the two sets are disjoint, the first argument is non-null, and the
second argument contains no duplicates.

I just looked at the code and see why this happens.  For each element of
the second argument it compares it with all the elements of the result,
and adds it to the result if the comparison fails.  Note that it is
comparing with the result, not the original first argument.  So, as the
result grows, the remaining elements of the second argument are compared
to more elements; it's essentially performing a REMOVE-DUPLICATES
operation on the second argument, which is O(L2**2).

    ouch. this can be done with N * M comparisons.

Changing it to use an N*M algorithm is pretty easy.  One difference is
that duplicates in both arguments would be duplicated in the result;
currently, only duplicates in the first argument are duplicated in the
result.  Both behaviors are permitted by the specification, and it's not
clear that it's worth paying an extra factor of L2 for this partial
duplicate suppression; a comment in the code even mentions that the use
of the first argument to initialize the result is done to take advantage
of the allowance for duplicates in the result.