[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LISPM I/O performance hysteria

[This is mostly a restatement of my position and a plea for
more rigor in this discussion.  If you're not directly involved
you may wish to hit Delete now...  I'm don't think I can send
this directly, so I'm sending it to SLUG anyway.]

>From the tone of your message, I infer that something in
my message has rubbed you the wrong way and led you to
want to entrench yourself.  I apologize if that's so.
Let me try to make my intent more clear.

Although you seem to treat it as all one issue, you've
really raised TWO sets of issues.

    Date: Fri, 19 Jan 90 18:44:51 CST
    From: "kosma%ALAN.kahuna.decnet.lockheed.com %ALAN.kahuna.DECNET.LOCKHEED.COM"@warbucks.ai.sri.com
	Date: Wed, 17 Jan 90 15:06 PST
	From: sobeck@RUSSIAN.SPA.Symbolics.COM
	If you are willing to work with local FEP files, you can achieve performence of 
	about 250K Bytes/sec(reading or writing), not counting the time required to construct 
	      ^^^^^^^^^^^^^^  !!!
	the data structures.

    This is **exactly** the kind of benchmark I've been talking about!!  If
    My amiga (total system cost of about $3000) does disk i/o that peaks out
    higher than that!! (with a stupid seagate drive, even).  I've seen
    Micropolis drives peak out at over 600 KBytes/sec.  

The performance issue you're really complaining about has
nothing whatsoever to do with this.  It's not the peak
transfer rate that you're running into, it's the incredible
amount of overhead you run into if you read your data via

Agreed, 250K is slow; the disk driver's overhead is rather
excessive, and the 3600 series' design predates such nicities
as VLSI disk controller chips.  The disks are slower than
more modern ones.  The list goes on.  On the average,
throughout the system, I'd say Symbolics IO is about 2.5x as
slow as it should be, for the hardware.  And in some cases,
the hardware is slower than it ought to be for the era in
which it was designed.

That's averaged over everything from the disk driver to
the highest-level IO, the 2.5 figure varies widely depending
on just which part of the IO system you look at.

But let's keep some perspective here:  Your program spent
maybe 15 minutes doing IO.  It spent maybe 30 minutes parsing
with READ.  And it spent MANY HOURS doing SOMETHING ELSE.

So, sure, IO is a problem, but it is not >>YOUR<< main problem.
At least, not yet.

							And that's
    reading/writing TEXT (HUMAN READABLE) FILES!!! 

This has nothing to do with anything.  "TEXT" is just bytes.
We're all talking about bytes when we're talking about
transfer speeds.

Is your point that you WANT humans to be able to read the
data with text tools?  If so, you can just say so; it would
be a legitimate point.  Obviously, you'll pay a high price
for this convenience in performance and disk space; my
personal experience is that it's usually not worth it for
large amounts of data.  There's something inherently NOT
human readable about that much data...

    When I'm dealing with large amounts of numerical data, the last thing I
    want to do is to use some funky binary format.  Typically I get geometry
    files off of a UNIX system or an IBM mainframe, process them on VMS to
    get volume descriptions, then load them into the symbolics and crunch on
    the connection machine.  The only way to do file interchange between
    different pieces of code on different systems is to use ASCII files

What's funky about 8-bit bytes?  It happens to be the
industry's most reliable, most portable, most standard, and
most EFFICIENT format.  (Unless you're using network mail as
a transfer medium...)  Especially when you get IBM mainframes
in there; they like to talk EBCDIC, you know.  ASCII is
certainly *NOT* the *ONLY* way to get data between systems or
applications, and it's certainly *NOT* what I would recommend
to you, if you have control over the applications in
question.  Obviously, if you don't, then my recommendation is
moot, so just say so and we can ignore it.

[BTW, I consider the only truly portable file type between
Common Lisp's is :ELEMENT-TYPE '(UNSIGNED-BYTE 7).
:ELEMENT-TYPE 'STANDARD-CHAR comes in second place, with the
aid of file translation software as needed.]

    which **SHOULD** run at least one or two hundred K/sec.  

I'll buy those figures, for reasonably coded applications.

Calling READ, however, does not qualify.  I don't care if
your talking about Symbolics, or Franz on a Sun 4, I would
NEVER EVER want to load 5 Mbytes of data into any Lisp via
READ if I could avoid it.

							     I think this is
    totally reasonable and that the Symbolics I/O times are incredibly poor.
    I couldn't believe somebody (in another slug message) talking about 40
    minutes to read a 5 MB file like it was acceptable!!!!  Pure garbage!!

Sure, if you take the slowest technique with the most
overhead, you can spend 40 minutes.  You missed my point.  It
does NOT take 40 minutes to read a 5MB file.  YOU CAN WASTE
30 MINUTES by doing all sorts of silly things, like checking
for lists, and symbols, and rationals, and arrays, and
read-time evals, readmacros, readtable syntax, *READ-BASE*,
and Symbolics extended characters, and Japanese and ...

And it still doesn't add up to the times you originally reported.
THAT was my point, not that 40 minutes was wonderful.

Look, in *ANY* Lisp, if you want to input data reasonably efficiently,

And if your language implementation provides you with a way
to do so without consing, (Common Lisp doesn't, Symbolics
*DOES*), avoid doing character-at-a-time IO.  This holds true
whether you are programming in Lisp, or C, Pascal, or TECO.

Actually, a lot of systems don't really provide any way to do
character at a time IO; you have to do a block at a time, and
pull the characters out yourself.  There's good reason for
this: in many operating systems the overhead for system calls
makes character-at-a-time IO prohibitively expensive.

I don't mind complaint sessions about Symbolics' IO; there's
plenty of grounds for rattling their cage about IO.  But
let's try to keep it real, OK?  Separate the issues of coding
style from IO.  You can complain about READ being slower than
READ on some other system, but you haven't done that.  You
haven't presented any data for such a conclusion.  I don't
know if it is or it isn't.  I didn't investigate that far,
and you didn't either.  Nobody else in this discussion has
yet measured READ exlusive of IO on different systems, either.

You originally reported that it took many hours to read the data.
I pointed out that it was really more like 40 minutes, even if you
use the same poor techniques I argued against using.  I did not say
40 minutes was great.  I only said it better than what YOU reported.

Look, all I'm trying to do here is introduce a bit of rigor
into this discussion, and a bit of basic software engineering
and efficient programming.

I'm trying to separate things out:

On one side I want to gather all the LEGITIMATE complaints
about Symbolics IO, built on a LEGITIMATE understanding of
how Symbolics compares with the rest of the industry.  If
Symbolics is really only 5X slower than our consensus
reasonable value R, I don't want to waste time arguing
about why it's 100X slower.

On the other side, I'm perfectly happy to discuss why somebody's
application runs 100X slower than they expected.  If a factor of
2 comes from Symbolics IO and a factor of 5 comes from READ being
slower than it should be, and another factor of 10 comes from using
READ on a :ELEMENT-TYPE 'CHARACTER stream when you should be doing
something else.

What I'm NOT happy with is the current discussion.  After trying
to separate out the various issues, you seem to be trying to put
them back together into a fairly non-productive "But Symbolics IO
is SLOW!", to maximize how much complaint you can lay at Symbolics'
doorstep.  I think that wastes our time, and is likely to alienate
Symbolics, as well.

Here's my synopsis of the discussion so far:

NOBODY has argued that Symbolics IO is not slow.

We've argued that it's not as slow as you originally complained.

We've pointed out that you've made it even slower by poor
implementation choices.

We've measured how slow it is.

We've discussed WHY it's slow.

We've heard from Symbolics about how they're making it less slow.

We've speculated as to why your timings are so much worse than
even our measurements of your techniques.  (I'm curious to know
if these speculations prove helpful; I hope they do).

Anyway, I'll be very interested, after you make your program
either use a binary file, or if you feel you can't do that,
some care about buffer boundaries, and PARSE-INTEGER, and then
you come back and tell us:  "It takes X seconds, and that's too
slow".  That will be good and useful, and I hope you do it.

Or you may come back and say that it takes too long to transfer
your data over the net.

More likely, you'll do both!

It might be useful if we could come to some consensus as to what
IO performance level we think Symbolics should have.  Ideally, that
should be referenced to other existing hardware, languages, and
implementations, and prioritized by how important we (slug) think
the different areas are.  READ's performance vs character-at-a-time
vs local buffered string-character vs network vs ...

Also, if you want to complain that what you had to do was too
hard, and you'd like tool X to make it easier, that might be
interesting, too.

Anyway, sorry for the length of this missive.  I've spent too much
time triming out unnecessary flamage; I don't have time left to
trim out unnecessary verbiage!