[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: big sort



At 8:53 AM 4/19/95, Mathieu Lafourcade wrote:
>Hi all,
>
>I have to sort a *really* big number of strings. I read these strings from
>files and I want to produce a new file with the strings sorted (you know
>something like some word processors can do). The number of strings is
>several 10000s.
>
>I tried with buffers (I am affraid that lists would be a real memory hog -
>but I should confess that I didn't try). I use a dichotomy to insert a new
>string. It is quite slow (Quadra 900) and it become slower and slower as
>long I add new strings (this was expected but not to that extend).
>
>Does anybody had faced this kind of problem ? any pointer would be appreciated.

50000 strings at 100 characters apiece is 5 megs. 8 bytes apiece for
cons cells is another 400K, so you need 5.4 megs of free space
to do it all in memory (probably less; look at your input file size
to determine how much less than 5 megs you need for the strings)
Unless you don't have that much space, or whoever will be using your
software doesn't, just start up an 8 or 12 meg MCL, and you can use
the builtin SORT function. If that amount of memory is an issue, you'll
have to write your own file based merge sort. This also won't be difficult.
Ask if you need more info about that.

MCL's thermometer example will be helpful to give you some feedback
on how MCL is doing in reading/writing the files:

(require "THERMOMETER")
(file-thermometer)