[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fast reading of tab-delimited data -- question



I need to read in some fairly large tables of data, in tab-delimited
format.  Most of the entries will be numeric (fixnum or float), though a
few are text.  I have written some Lisp code (using with-open-file and
read-char and/or read-line) to do this, but it's really(!) slow.  I was
looking at mac-file-io and boyer-moore in the examples folder to see how
this could be done faster, but I am stumped by the following.

Mac-file-io's fsRead gives me a buffer of raw bytes.  If I want to call
something like read or parse-integer on these, the obvious way is to make a
string each of whose elements is the character made by code-char of the raw
bytes.  This, however, gives back most of the efficiency gained by doing
the low-level file i/o.  Looking at the sources on the CD, it appears that
read-from-string (for example) creates a stream from which the reader then
reads.  I can't find the code for the reader itself on the CD, so I can't
tell what operations it actually performs on this stream.  If it's only
read-char, then presumably I could create a stream that feeds characters
just from the byte-buffer.  I presume that this is something like what read
does on a file.  Is there a simple way to do this, or has anyone already
done it?

Thanks.  --Pete Szolovits

P.S.  I realize this is confused/confusing.  What I think I'm trying to
accomplish is:
1.  Find chunks of bytes delimited by tabs/newlines/eof using the lowest
possible level representation.
2.  Having the raw bytes and boundaries, call the Lisp reader to interpret
successive substrings of those bytes, without having to re-read the file or
dynamically make Lisp strings (just to be turned into an internal stream
again) from the raw bytes.