[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LISPM I/O performance hysteria



    Date: Wed, 24 Jan 90 13:34 PST
    From: Montgomery Kosma <kosma@ALAN.kahuna.decnet.lockheed.com>
	Although you seem to treat it as all one issue, you've
	really raised TWO sets of issues.

    Well, I suppose that if one were to really look at optimizing the disk
    routines, you would definitely have to separate out the physical IO time
    from the processing overhead from things like READ.  However, I don't
    use READ, 

Well, the original reporter I began responding to was using
READ, but apparently addressees and responses have become so
tangled I can't untangle them.  Sorry.  My responses are valid,
but not always directed at the person who spoke last, it appears.

	      although it is not clear that the stuff I'm using is the most
    efficient in the world either.  So, what is the most efficient way to
    read in a text file?  

Read it and do what with it?

			  Obviously, in my codes at least, I can reduce that
    overhead by being smart about the way I do things (book 5 could be
    better...).  On the other hand, things as basic as reading a file into
    ZMACS take FAR too long (and I'm sure that the people who wrote ZMACS
    knew lots more about optimizing file IO than I do).  

I'm sure they do NOW!  When Zwei was written, however...
Not to mention that the tools available were very different
back then.  I'm not sure just when Zwei was written, but I'm
sure it was more than a decade ago.

	But let's keep some perspective here:  Your program spent
	maybe 15 minutes doing IO.  It spent maybe 30 minutes parsing
	with READ.  And it spent MANY HOURS doing SOMETHING ELSE.

    Okay, first of all, I'm not sure what you mean by "my" program.  Perhaps
    you have me confused with somebody else?  

That seems probable.  I think you may have inserted yourself
by replying to a reply I made to someone else, and I missed
the fact that there was an additional participant with an
additional set of complaints.  Or if you were the original
person, I may have been responding to someone who responded
to you, but got the addressees wrong.  Whatever; sorry for
the confusion, and I hope all the relevant stuff gets read
by the relevant persons and that it's helpful when it does
get there.

					      I'm not trying to pin down a
    specific file IO example; I'm talking about everything from codes that
    I've written to reading a text file into ZMACS.  

    For example, one particular code which I've got running here does
    approximately 15 to 30 minutes of file IO (NOT using READ, incidentally)
    for approximately 2 minutes of compute time on the Connection Machine.

This isn't exactly a surprising result.  But the first step to
speeding it up is to figure out what this "15 to 30 minutes of
file IO" means.  What exactly are you doing during that time,
what are the subtasks involved, and what are the alternatives?

	So, sure, IO is a problem, but it is not >>YOUR<< main problem.
	At least, not yet.

    IO *IS* a problem and is a main problem for me.  
(meaning you, not whoever I thought I was responding to...)

If your time really is doing IO, and not doing parsing, consing of
temporary strings (as in READ-LINE), then yes, IO is your main
problem.  Break it down and analyze it.
								And that's
	    reading/writing TEXT (HUMAN READABLE) FILES!!! 

    It SHOULDN'T have anything to do with anything, but when people propose
    solutions like "binary files are much faster" 
Storing numbers in binary files is faster than parsing and unparsing
them to and from text.
						  or "using the FEP IO
    system" 
That's the FEP *FILE* system, which has almost nothing to do with the
FEP.  (It's called that because FEP knows how to read it, in addition
to the Lisp machines).  It's a perfectly good filesystem that's faster
than LMFS, and can hold text just as well as binary data.

	    or "using dump-forms-to-file" I simply cannot accept that as a
    possible solution to my problem!  Text files *shouldn't* have to be so
    much less efficient.  
It is *INHERENTLY* less efficient to convert your data to and from
a textual representation.  Symbolics' READ function may be less
efficient than it has to be, but READ is *INHERENTLY* less efficient
than a more specialized parser like READ-INTEGER, and READ-INTEGER
is *INHERENTLY* less efficient than AREF.  I think your "*shouldn't*"
is just wishful thinking.

			  And I know that it has to do with processing
    overhead, not transfer speeds (bytes is bytes) but my point is that
    OVERALL disk IO for whatever types of files should be more efficient
    than it is.

Well, calling what your talking about "OVERALL disk IO" is
turns things into such a jumble that nobody will know what
anybody is talking about, let alone be able to do anything
about it.

	This has nothing to do with anything.  "TEXT" is just bytes.
	We're all talking about bytes when we're talking about
	transfer speeds.

    Yes but I'm thinking about more than just raw transfer speeds.  Maybe
    I'm being too vague to just talk about the "feel" of the system reading a
    file.
Yes, certainly.  Contrast that statement with "It takes too long
to read in a lisp source file into a Zmacs buffer with c-X c-F.
It's especially annoying all the time it spends playing with
completion tables and sectioninzing the buffer at the end."

The time spent doing READ for compiling a file (the usual use of
READ) doesn't bother me.  It's so much slower than the compiler
(which slowness DOES bother me) that it doesn't matter.


    good point, but the main reason is to facilitate data interchange,
    although human-readability is necessary at times for checking numbers in
    a file (not typically the whole file, but maybe the first few entries,
    or looking for specific data in the file).

There are plenty of tools to look at binary files.
(But not distributed with Symbolics software.)

	What's funky about 8-bit bytes?  It happens to be the

    there's nothing funky about 8-bit bytes.  You miss my point, I think.
    When wanting to write floating point numbers on a VAX and read them in
    on the symbolics, as far as I'm concerned my only reasonable option is
    to write the data into a TEXT file and to read it in as a TEXT file.
    Whether this is 8 bits or 7 bits or whatever doesn't matter.   The
    FUNKYNESS (?) I'm talking about is in the binary representation of
    floating point values, which I want to stay away from.

Well, the person I started out responding to was writing groups of
integers, not floating point.  It certainly requires a bit more
care and thought about how to write floating-point numbers in a
portable way.  That's why Common Lisp provides the
INTEGER-DECODE-FLOAT function.  In fact, doing the equivalent of
INTEGER-DECODE-FLOAT is far more reliable about really transfering
the exact floating point number across architectures, language IO
packages, etc. than any other technique.  Not everybody follows
the same rules for how to output floating point numbers, although
IEEE standards are making it better...

So if you're DESIGNING file formats for portable communication
between programs, you should keep that in mind, but if the file
formats already exist, again, the point is moot.

    I don't call READ.  What I've been doing is reading in everything as
    strings and doing 

[Your sentence ended there... doing what?]

Via :READ-STRING or READ-LINE?  That's where all your time is
going... consing strings and GCing them.

    Evaluation of (ZWEI:FIND-FILE #P"KARL:>foo.foo") took 16.414146 seconds
    of elapsed time ...

    This works out to around 14K bytes/second, an abominable transfer rate,
    if you don't factor in overhead.  I don't know what zwei:com-read-file
    is doing internally, and I don't really care--it doesn't really matter.
    What DOES matter is that reading this file into ZMACS takes 16 to 17
    seconds, while reading the same file into emacs on my amiga takes 1 to 2
    seconds, and on our sparcstation loading it into gmacs, it's basically
    instantaneous (even I was surprized).

This is certainly a legitimate complaint, and you won't get
any quibbles from me about it.  This is entirely Symbolics'
responsibility.

However, if *YOU* (for various values of *YOU*, apparently)
are using bad techniques in your code, that's *YOUR*
responsibility.  I try to separate out the two cases, and in
the first case, I can only agree and provide additional
information about why or measurements, or similar stuff.  In
the second case, I try to turn you from critising Symbolics to
identifying ways to solve the problem.  (This frequently involves
recursively separating out the parts that are your fault and
the parts that are Symbolics's fault.)

	I pointed out that it was really more like 40 minutes, even if you
	use the same poor techniques I argued against using.  I did not say
	40 minutes was great.  I only said it better than what YOU reported.

    okay, so what's the BEST it could be???  Give me a piece of code which
    can read integers and floats from a text file FAST and I'll gladly use
    it!

Submit a P.O., and I'll gladly do so.  I should be doing
something billable intead of dispensing free advice.

    I admit that I haven't looked carefully at file IO for some time, but
    a couple months ago when I was in the midst of my file IO problems, I
    carefully read book 5 and tried to do it the best I could with what
    information I had.  

It might be useful to analyze where book 5 let you down, and
tell Symbolics.

			Probably our IO code can be made better, but I have
    no idea how much...I just looked at one piece of code which is actually
    using read-line, not read, and then we later go ahead and pull out
    fields as reals or ints based on our knowledge of the data file's
    format.  

READ-LINE is going to cons a lot more than READ!  It just may
be your WORST option.  I'm not even going to think about
benchmarking it for you to show you, because GC time is highly
sensitive to machine configuration, but in theory it could
be almost unboundedly slow.  It should be a simple change to
your code to make *ONE* string, and then use :STRING-LINE-IN
to fill it.

	     On the other hand, ZMACS reading a file is probably as
    efficient as it's going to get, 

No, this assumption is completely wrong.  First, Zmacs must
put all of that text into data structure, not just parsed
(which perhaps you don't even keep around, but pass off to
the connection machine).

				    and so that may be a better basis for
    comparison (at least semi-quantitative comparison).