[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[more about] garbage



    Date: Tue, 15 Dec 87 11:02 EST
    From: Barry Margolin <barmar@Think.COM>

	Date:    Mon, 14 Dec 87 15:31:00 PST
	From: LAU%PLU@ames-io.arpa

	Symbolics Software Support:
  
	  A question about memory usage on a 3640 with 12 MB of main memory.  Within
	  6 hours of idle time, two warning messages about memory space came up.  The
	  difference between the amount of space left in the first message and the
	  second was about 500,000 words.  What is using up the space?  The machine was
	  idle.  The system has prolog and tcp loaded but at the time, no one was using
	  the machine.  This happens so often that the system goes into the fep with a
	  message about:  Running out of swap space.  (usually after a weekend)
  
	  sonie
	  arpanet address:  lau@ames-pluto.arpa

    We saw something like this happen to most of our Symbolics machines
    once.  We have a number of diskless Sun workstations on the network, and
    had a power failure that took many of them down simultaneously.  When
    they came back up they all broadcast a bunch of TFTP requests looking
    for their boot programs.  This caused a large number of TFTP server
    processes to be consed on the Lisp Machines.

    We "solved" this problem by disabling the TFTP server on our Lisp
    Machines.

						    barmar

The following is some information about what may cause this.  We have
had similar problems related to network broadcast protocols, and
discovered some interesting things.   

When a packet requesting a service supported by your lispm is received,
the ethernet interface process normally invokes the server function with
a call to PROCESS-RUN-FUNCTION, spawning a separate server process.
PROCESS-RUN-FUNCTION maintains a simple resource of these processes in
the variable SI:PROCESS-RUN-FUNCTION-SPARE-PROCESSES and reuses them
when possible.  Thus any activity which generates many requests for
services in a short time interval, not allowing server processes to
complete and become reusable, will cause PROCESS-RUN-FUNCTION to create
new processes.  Each new process has to have a stack group, and each
stack group typically gets two stacks allocated.  Each stack appears to
require 32k words of 1static0 (non-garbage-collectable) space regardless
of the arguments passed to MAKE-STACK-GROUP.

So if 50 service requests get generated simultaneously for the same
host, 50 times 64Kwords, or 3.2 megawords of static space (minus some
amount depending on availability of currently unused processes in the
list SI:PROCESS-RUN-FUNCTION-SPARE-PROCESSES) will be allocated.  This
would not normally be any problem, 2except with broadcast0 requests.
Broadcast requests are always "serviced" by every machine that has the
correct service entry; because of this, many can effect one host at the
same or nearly the same time (as in the case Barmar mentioned).  It's
unlikely that a non-broadcast protocol will lead to this happening
except when redundant requests are erroneously generated.

The moral is: be2 very0 careful how and when you use broadcast (I think
this point is documented; unfortunately other vendors may not have read
Symbolics' documentation). 

It is easy to find out whether this is what's biting you just by looking
at the value of SI:PROCESS-RUN-FUNCTION-SPARE-PROCESSES.  If this list
is long, the problem has occurred (usually this list has less than ten
elements).  I have noticed that the list can shrink periodically by some 
means I haven't identified, so this condition can explain ongoing growth
in static space allocation.

  -- William D. Gooch