[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: big sort



At 20:53 4/19/95, Mathieu Lafourcade wrote:
>Hi all,
>
>I have to sort a *really* big number of strings. I read these strings from
>files and I want to produce a new file with the strings sorted (you know
>something like some word processors can do). The number of strings is
>several 10000s.
>
>I tried with buffers (I am affraid that lists would be a real memory hog -
>but I should confess that I didn't try). I use a dichotomy to insert a new
>string. It is quite slow (Quadra 900) and it become slower and slower as
>long I add new strings (this was expected but not to that extend).
>
>Does anybody had faced this kind of problem ? any pointer would be appreciated.
>Thank in advance
>Mathieu


The standard approach to big sorts is to do little sorts and then merge them.
I believe Knuth talks about this (sorry, I don't remember which volume).

I would have guessed the buffers were the wrong data structure, because inserting
in the middle has to move everything beyond it.  I'd try sorting a list of the strings,
e.g. (sort <list-of-strings> #'string-lessp).  Remember, if you need the unsorted list
intact, you need to do (sort (copy-seq <list-of-strings>) #'string-lessp).


>
>---
>Mathieu Lafourcade
>* UTMK (Unit Terjemahan Melalui Komputer) - USM (Universiti Sains Malaysia)
>- 11800 PENANG - MALAYSIA - Tel: (60) 4 860 3002 - Fax: (60) 4 656 3244 -
>lafourca@cs.usm.my
>* GETA (Groupe d'Etude pour la Traduction Automatique) - IMAG (Institut de
>Mathematiques Appliquees de Grenoble) - BP 53 38041 GRENOBLE CEDEX 9,
>FRANCE - Tel : (33) 76 51 43 80 - Fax : (33) 76 51 44 05 -
>mathieu.lafourcade@imag.fr