[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fast reading of tab-delimited data -- question
- To: info-mcl@digitool.com
- Subject: Fast reading of tab-delimited data -- question
- From: psz@mit.edu (Peter Szolovits)
- Date: Wed, 28 Dec 1994 15:39:57 -0500
- Sender: owner-info-mcl@digitool.com
I need to read in some fairly large tables of data, in tab-delimited
format. Most of the entries will be numeric (fixnum or float), though a
few are text. I have written some Lisp code (using with-open-file and
read-char and/or read-line) to do this, but it's really(!) slow. I was
looking at mac-file-io and boyer-moore in the examples folder to see how
this could be done faster, but I am stumped by the following.
Mac-file-io's fsRead gives me a buffer of raw bytes. If I want to call
something like read or parse-integer on these, the obvious way is to make a
string each of whose elements is the character made by code-char of the raw
bytes. This, however, gives back most of the efficiency gained by doing
the low-level file i/o. Looking at the sources on the CD, it appears that
read-from-string (for example) creates a stream from which the reader then
reads. I can't find the code for the reader itself on the CD, so I can't
tell what operations it actually performs on this stream. If it's only
read-char, then presumably I could create a stream that feeds characters
just from the byte-buffer. I presume that this is something like what read
does on a file. Is there a simple way to do this, or has anyone already
done it?
Thanks. --Pete Szolovits
P.S. I realize this is confused/confusing. What I think I'm trying to
accomplish is:
1. Find chunks of bytes delimited by tabs/newlines/eof using the lowest
possible level representation.
2. Having the raw bytes and boundaries, call the Lisp reader to interpret
successive substrings of those bytes, without having to re-read the file or
dynamically make Lisp strings (just to be turned into an internal stream
again) from the raw bytes.