[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why is READ-CHAR so blindingly slow?



    Date: Fri, 24 Feb 89 10:19 PST
    From: Robert S. Kirk <rsk@SAMSON.cadr.dialnet.symbolics.com>

	Date: Fri, 24 Feb 89 10:23 EST
	From: Reti@RIVERSIDE.SCRC.SYMBOLICS.COM
	    Date: Thu, 23 Feb 89 07:38 PST
	    From: rsk@SAMSON.cadr.dialnet.symbolics.com (Robert S. Kirk)

	    I'm writing a special file parser and discovered that READ-CHAR is
	    *much* slower than READ-LINE, even though it does little consing.
	    Theoretically, READ-CHAR create its own buffer, so it sould be only
	    slightly slower than READ-LINE due to extra function invocations.

	MANY extra function invocations, plus a lot of extra error checking.

	    I thought I could be smart by creating my own buffer and filling it with
	    READ-CHAR.  Oh, well!

	It seems a waste to be creating a buffer of your own when the underlying stream
	is most likely buffered anyway.  

    I am assuming that the underlying stream is buffered, and that is why
    READ-CHAR should be very fast since it should only have to fetch a char
    from the buffer.

    Creating my own static buffer conses much less than using READ-LINE,
    since READ-LINE has to cons a string each time its called.

	Have you tried processing the data in the
	system's buffers by using :READ-INPUT-BUFFER and :ADVANCE-INPUT-BUFFER.  I
	think you'll find that is the most efficient way to process file data.

    Well, I guess I forgot to state that I need to stay with pure Common
    Lisp for compatiblity with VAX Lisp and Lucid Lisp.  
This seems to me to be a deficiency in Common Lisp.  Even the best implemented
READ-CHAR and READ-LINE cannot provide an efficient way to do I/O because
there is necessarily an overhead associated with going into and out of the I/O
system code.  By using READ-CHAR you are paying that overhead once per
character; using READ-LINE you are paying it once per line but at the
cost of extra consing overhead you don't want.  The primitive you want
is "Fill this buffer for me" (which is implemented on the 36xx and Ivory as
:STRING-IN) but which Common Lisp doesn't provide.  This has the I/O
system overhead only once per your buffer, at the cost of copying
overhead.
								    
The approach I suggested last time (:READ-INPUT-BUFFER) has potentially
the least overhead, since you pay the I/O system entry overhead once per
the I/O system's buffer and you don't have to copy the data.  However,
this has some consequences: not all streams are required to implement
:READ-INPUT-BUFFER, so it wouldn't be as generic as you'd like, and you
don't get to specify the buffer sizes and locations, which makes some
types of applications more awkward to code.

In my experience, you often can only get one of compatibility and
performance in applications that are intended to be portable.  For those
parts where performance matters, I generally conditionalize the code and
provide tuned paths for each of the implementation targets (with a
generic default).  In your case, I'd do that at the level of :STRING-IN
functionality, not READ-CHAR.
							 If this was not the
    case I would meta-. my way down into system code and hack up something
    which interacted with the internal stream buffers.  Your suggestions
    above would be considered.

    I may end up making my own version of READ-CHAR which uses Symbolics
    internal on LispMs and the real READ-CHAR on other machines.
This may help some, but not as much as processing the data in the actual buffer.

Right now, READ-CHAR has to (each time) check for NIL and T stream args
and default correctly, worry about whether or not to echo, and deal with
the various options for handling EOF before it can get down to invoking
the generic :TYI function on the stream to actually get the character.

If you called the :TYI method directly, you'd avoid all of the above overhead,
but you'd still have that inherent in the :TYI method itself (this varies from
stream to stream, but the simplest version still has to check each time to
see if the current buffer has been exhausted (plus of course the function call
itself).

    Is there any point in telling the system something special about the
    stream when I open it so that less checking is done?
If I understand your question correctly, there is no way to do this.