[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Host uptimes



    Date: Tue, 7 Feb 89 14:36 PST
    From: Spock@SAMSON.CADR.DIALNET.SYMBOLICS.COM (Mr. Spock)

        Date: Mon, 6 Feb 89 17:49 PST
        From: TYSON@Warbucks.AI.SRI.COM (Mabry Tyson)

        Awhile back there was some discussion about host uptimes.  One host here
        has just passed the 1/2 year mark of being up.  (It is in an individual's
        office and he does use it daily but he doesn't program.  Mainly he uses
        ZMAIL, TELNET, and Image-Calc.)  Another host is up to the 5 month mark.
        Our main file server/syshost was up 15 weeks until it crashed today.  Four
        other hosts have been up for 2 months or more.  (We have about 31 hosts.)

        On the negative side, we have one machine that has been down for about
        3 weeks with a bad disk.  (Yes, we are on full maintenance and they are
        working on it, but still...)

    We have a 3600 here in Twain Harte that (except for power outages)
    stayed up for somewhere in the neighborhood of 3 years (RSK, Correct me
    if I'm wrong).  Of course the track record has gone down since the
    installation of the IFU board set about 2 years ago but still it's a
    very reliable machine.  If it weren't for the power going out in the
    winter around here we'd probably have some pretty amazing uptimes.

I think he was refering to time between machine crashes and not time between
hardware breakages.

Funny how these messages are coming over the network just now. I have been looking
forward to sending a message telling everybody that my machine is up
11 weeks 6 days 4 hours 19 minutes 17 seconds, for a while now. I have been waiting
for the machine to crash so I can get a maximum time but my machine just refuses
to crash. But I guess that others have outdone me by far. On the otherhand,
I do a lot of compute stuff like run SPIRE, and my own Prolog compiler --- not
just telnet and mail. Sometimes I have to hold its hand during GC by doing
several rounds of GC-by-area to keep it going. Right now my GC thermometer shows
that about 95% of memory is used. I have been here before and have successfully
recovered about 30% of memory by doing several judicious rounds of GC-by-area.
Michael Greenwald has unofficially told me about a hack called "slow GC" which
can do this somewhat automatically. Could somebody from Symbolics fill me in
on the details of whether this exists in 7.2 and if not whether it will exist
in a future release?

Specmanship aside, I think Symbolics deserves a round of applause for improving
both the hardware and software reliability of their products. I remember the
days of release 4.5 when my machine would crash several times a day. I still
have nightmares about "Page fault on unallocated VMA" or "Unrecoverable disk
overrun" or "Lisp stopped itself". Once upon a time you couldn't run a job
unattended overnight with more than a 10% chance of it finishing. Now I
regularly run heavy compute bound tasks which take a whole weekend. Keep
up the good work!
        Jeff