[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Performance (would you believe BYTE?)
- To: info-mcl
- Subject: Performance (would you believe BYTE?)
- From: michaelg@Xenon.Stanford.EDU (Michael Greenwald)
- Date: 3 Dec 92 22:36:54 GMT
- Distribution: comp
- Newsgroups: comp.lang.lisp.mcl
- Organization: CS Department, Stanford University, California, USA
- Sender: news@CSD-NewsHost.Stanford.EDU
This is a non-urgent comment on performance of some MCL code. (I'm
not an MCL user, so perhaps this is already well-documented.)
Someone was flaming in the hall here about how slow lisp was, and how
he'd always use C -- he'd given Lisp its chance. The application he
described sounded like it should have been reasonably fast, but
apparently this was taking minutes when the entire operation should
have taken a couple of seconds. I was curious, so I decided to look
into it.
Apparently he learned Lisp for some class here, and later used MCL on
a Mac. He took me to a Mac, and showed me a program that did
something like the following:
He took an array of bits that was stored in some Macintosh data
structure, and read it out using something like sys:%p-contents-offset
(I think the actual call was something like %read-word or %get-word, I
won't swear to it). His routine was supposed to read it out N bits at
a time (N being a parameter, not necessarily a power of 2), but he
didn't know how to do it from Lisp. So he read it out a byte (or a
word, or an integer, whatever) at a time, and put it into a bignum
(!!) using something like:
(DOTIMES (I NBYTES)
(SETQ BITS (DPB (%READ-BYTE PTR (- NBYTES IDX 1))
(BYTE 8 (* I 8))
BITS)))
(I'm not sure of the exact form, but this was the general form) and
then extracted it in another pass using LDB and BYTE, and put the
numbers into an array for processing. (It was some data compression
or analysis routine, that he didn't go into detail on).
This was >slow<.
Anyway, when I tried this for a large NBytes, it raised an error,
eventually, because DPB was getting a vector instead of a number. (I
don't know if this was a bug, or if he somehow corrupted memory
getting the bits into the Macintosh data structure. I assume
everything he did on the Mac end was kosher, but who knows?)
Apparently, when the value of 'I got large enough (BYTE 8 (* I 8))
generated some humongoid number.
Sure enough, the implementation of BYTE on MCL conses some amazingly
large numbers. Using ASH sped up his application (still not fast
enough, and by then he was bored, and disappeared.) This was on MCL
2.0b3, and I understand that it is past beta now, although I doubt the
implementation of BYTE was changed.
I tried to argue that extracting the numbers out of an array of bits
wasn't that much different then he'd have to do in C, (two operations
for when the extraction crossed a word boundary, for example). But he
did have a point that numerical manipulation of bignum shouldn't be
consing so much more than it had to (I still think his >application<
was wrong (why into a bignum?), but his complaint about BYTE was right
on the money).
(And, BTW, is there some better way to extract unaligned bits from a
long bitstring? (Again, I have no pressing need for the answer, I'm
just curious)).
- Prev by Date:
Re: Toplevel question
- Next by Date:
Re: scrolling-view & clip regions
- Previous by thread:
Need voice recognition setup and experiences, please
- Next by thread:
speedup for array ops - summary Thank you to all those who sent replys to my inquiry: RE: speedup array op. This concerned a function that returned the maximum and minimum values for an array. Replies that I have so far are from: Patrick Logan patl@goldfish.mitron.tek.com Rich Lynch lynch@ils.nwu.edu Alex Repenning <ralex@tigger.cs.colorado.edu> Bill St. Clair bill@cambridge.apple.com Michael Travers <mt@media.mit.edu> Espen Vestre <espen@coli.uni-sb.de> There was enough interest and there were enough suprises (for me anyway) that I thought it deserves another go-around. What follows are the results that I got in the form of notes, and the (hopefully minimally corrupted) versions of the suggestions or code. I apologize for the length of this message - but in the interest of clarity, I have reproduced all of the definitions, even though many differ only slightly. I am pleased with the contributions and can use them as is, and would appreciate explanations of the various timings. How closely do the fastest LISP functions here approach C/FORTRAN/PASCAL code? (especially for various FIXNUM element types), or assembly code? Anyone have a feel for this? Would assembly code gain 10%, 100%, 1000% in speed? Thanks again -- Dave ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; notes ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Note 1: This sets up a simple array, complex, real values, 256x256 elements. Note 2: Vanilla version without suggested improvements to avoid wasteful code. 27 secs, 14 sec GC I kept the wasteful version around for several iterations so that the timings were for similar code. See Note 10 for effect of simple improvement. Note 3: Add a few declarations. They didn't help much - not enough of them. 24 secs, 14 sec GC Note 4: Just making sure that the 1-D array used inside Note 2 example was NOT simple (as expected). Bill St. Clair explains why this sort of array is NOT simple: "A simple vector is one for which it makes the most sense to code AREF in-line (given suitable declarations). AREF of a displaced array needs to indirect to the displaced array and add the displacement offset." This is interesting because the byte version of this example (below) is one of the fastest, despite of a non-simple vector being used. Note 5. Using the simple array and running two indicies. 27 secs, 14 GC. Note 6. Same as above, but add declarations. 29 secs. Oops. Note 7. Make a simple vector, fill it with stuff from the original array. Find the limits directly on this vector. (No declarations). 25 secs. 15sec GC. Note 8. Add declarations. 24 secs. 16 GC. NUTS! :-( Note 9. Use a list rather than an array. Stuff list - 11.3 secs, 10 of which is GC. Find limits using list: 14.7 secs, 10.4 of which is GC. ! Why is the list operation so fast? Note 10. Back to square 1, (Note 2), avoid wasteful calls to abs. 14.6 sec, 9.5 GC. Note 11. Take the let out of the loop. No change at all. That's great -- compiler must be smart. Note 12. Insert calls to floating point compiler. 13 secs. 7 GC. (Consing is still being done, but where?) Note 13. Bill's faster version, without (!) floating point compiler: 10.6 secs, 6.6 GC. A blow by blow explanation of why this works so well would be great. --- Now, for the byte version --- Note 14. Set up the array again. Everything went faster, so I quadrupled the array size. Note 15. Vanilla version, no declarations, but no wasteful code. 5 secs. no GC. Note 16. Vanilla version with declarations. 3.7 secs. no GC. Declarations were a respectable help this time. This was the 2nd fastest version! Note 17. Use a simple array, run two indices. 22 secs. Note 18. Same as above, with declarations. 20 secs. Guess that's why everyone uses displaced arrays, whether or not they are simple. Note 19. Tried to generate simple vectors various ways, all of which should have worked, I believe, according to the Optimization doc. (Khosla & St. Clair). Note that making the type 'unsigned-
- Index(es):