[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Request for help with paging



    Date: Fri, 17 Mar 89 14:45:06 EST
    From: whuts!davel@att.att.com

    Sorry about the delay; mailing glitch.
    From: jasper.scrc.symbolics.com!DLA (David L. Andre)
    >    Date: Tue, 14 Mar 89 20:08:41 EST
    >    From: whuts!davel@att.att.com
    >
    >    We have written a system which appears to have a grave difficulty with
    >    thrashing in the paging system.  When freshly loaded, the system
    >    appears to spend perhaps 10% of its process time in the paging system,
    >    but after about four hours of continuous running, paging system time
    >    increases dramatically to 80% or so.

    >Sounds to me like the working set of your application increases over time.
    >Perhaps you have a data structure which references lots of objects which would
    >otherwise be garbage?

    Please reread the starred paragraph below.  Actually, our system
    references fewer objects with time (since it takes more time to reference
    them, and our system is constrained to return a result within a fixed 
    time).

Please reread my paragraph above.  Just because your system actively references
fewer objects with time does not mean that it doesn't passively reference more
objects with time.  Being a GC person rather than an application programmer, my
use of "reference" above means "passively reference".

A passive reference is one which exists, even though your program may very well
choose not to look at the value.  GC doesn't know to throw away such references,
so it keeps objects both actively and passively referenced.  However, it may end
up interspersing actively referenced objects with passively referenced ones.
This can cause the working set to grow dramatically, since the objects which the
program references commonly become split among many pages.  From your
description, this is what I assume is happening.

The solution is to examine your code to see if any such passive references
exist.  If you find such beasts, assist the GC by removing them when they're no
longer needed.  For example, assume you have a data-structure which includes a
slot used for holding temporary structures in a particular computation.  In this
case, you could set that slot to NIL after the computation is complete, and then
GC would reclaim the object rather than keeping it.  (In fact, usually keeping
it next to the object in question for "locality".)


    *					  These results have been confirmed
    *    with both the TIME macro and the Metering system.  According to the
    *    metering system, there is no small subset of functions responsible for
    *    most of the paging.  So far, we have found that our efforts to increase
    *    locality of reference by using many areas have yielded minimal improve-
    *    ment.  

    >I am curious on why you think fragmenting memory with lots of areas will improve
    >locality of reference?

    The idea, from my cursory reading of book 8, is to improve locality
    of reference by putting objects *which reference each other frequently*
    into the same area.  We believe we had done that.  No, of course just
    randomly assigning objects to areas wouldn't improve paging performance.
   
The problem with this strategy is that it forces some references to be nonlocal,
while only allowing other references to be local.  Commonly, the trade-off
causes poorer paging performance.

My Lisp Pointers article discusses this in more detail.  When you originally
approached your salesman (Kevin Cheiff) with this problem the week before you
sent the message, I mailed him a copy to give to you.  You will be getting it
soon, if you haven't received it already.

    >	   We had expected the problem to be solved by the use of areas;
    >    we have a physical memory of 6MW, each area is smaller than 
    >    4MW or so for the first five hours of the run, and references between 
    >    areas should be uncommon.  For reference, we are using a 3675 with a swap
    >    space of 150MW, but using a smaller swap space (75MW) seems to have no
    >    effect on performance.
    >    1. What is the paging scheme used by the 3600-series (e.g.,
    >    least-recently-used), anyway? Software support doesn't know and has had
    >    some difficulties getting a hold of a developer who does.
    >    2. Has anyone had similar difficulties with paging?  How did you solve
    >    them?
    >    3. We are particularly alarmed by the fact that increasing the
    >    number of areas (and, we believe, increasing locality of reference)
    >    has had so little impact.  Does anyone know of a way to find out
    >    how much physical memory is devoted to each area?  Is there any way
    >    to measure locality of reference directly, such as counting the
    >    number of pointer references across area boundaries?

    >Have you tried using the metering interface to see what is causing the page
    >faults?  Have you tried documented functions to examine areas such as ROOM and
    >DESCRIBE-AREA?  You can discover things about cross-area references using
    >mouse-middle and SYS:%AREA-NUMBER.

    Please reread the starred paragraph above.

I stand corrected.  Sorry for not being more careful in my response.

    We have also used room, etc.  I am at a loss as to how using ROOM will
    tell us anything.  As far as I can tell, ROOM can show how many words
    are in each area. That is, it offers no more information than
    the Peek system's Areas screen. We need to know what is in *physical* 
    memory, and how structures are organized by *page*.

    Since I posted this request, I found out about the undocumented
    function SI:WITH-PAGE-TRACE, which appears to be somewhat useful, 
    although I would appreciate some assistance in using it and interpreting
    the results.

Generally, the metering interface offers all the capabilities of the page-trace
tool (which was a prototype for the metering interface).  It's documentation is
the source.  If you don't read source, try (si:page-trace-eval '(si:print-herald))
and then (si:page-trace-report).  Then start playing with the optional arguments
of each function.

I dug up some old hacks of mine which you might find useful:

;;; -*- Mode: Lisp; Package: SI; Base: 8 -*-

(DEFUN DESCRIBE-AREA-PAGES (&AUX (TPAGED-IN 0) (TCLUSTERS 0) (TTOTAL 0))
  (DO ((MAX (N-AREAS))
       (I 0 (1+ I)))
      (( I MAX))
    (MULTIPLE-VALUE-BIND (PAGED-IN CLUSTERS TOTAL)
	(PAGES-IN-AREA I)
      (FORMAT T "~%~A:~35T~5D resident in ~5D cluster~:[s~; ~] out of ~5D (~3D%, ~2,,6$ pages//cluster)"
	      (AREA-NAME I) PAGED-IN CLUSTERS (= CLUSTERS 1) TOTAL
	      (IF (ZEROP TOTAL) 0 (ROUND (* 100. PAGED-IN) TOTAL))
	      (IF (ZEROP CLUSTERS) 0 (// PAGED-IN (FLOAT CLUSTERS))))
      (SETQ TTOTAL (+ TTOTAL TOTAL)
	    TPAGED-IN (+ TPAGED-IN PAGED-IN)
	    TCLUSTERS (+ TCLUSTERS CLUSTERS))))
  (FORMAT T "~2%Total:~35T~5D resident in ~5D cluster~:[s~; ~] out of ~5D (~3D%, ~2,,6$ pages//cluster)"
	  TPAGED-IN TCLUSTERS (= TCLUSTERS 1) TTOTAL
	  (IF (ZEROP TTOTAL) 0 (ROUND (* 100. TPAGED-IN) TTOTAL))
	  (IF (ZEROP TCLUSTERS) 0 (// TPAGED-IN (FLOAT TCLUSTERS)))))

(DEFUN PAGES-IN-AREA (AREA &AUX (SUM 0) (CLUSTERS 0) (TOTAL 0))
  (DO-AREA-REGIONS (REGION AREA)
    (DO ((VPN (LDB %%VMA-PAGE-NUM (REGION-ORIGIN REGION)) (1+ VPN))
	 (LIMIT (LDB %%VMA-PAGE-NUM (+ (REGION-ORIGIN REGION)
				       (1- (REGION-CREATED-PAGES REGION)))))
	 (LAST NIL))
	 ((> VPN LIMIT))
      (COND ((STORAGE::PAGE-RESIDENT-P VPN)
	     (INCF SUM)
	     (COND ((NOT LAST)
		    (INCF CLUSTERS)
		    (SETQ LAST T))))
	    (LAST (SETQ LAST NIL)))
      (INCF TOTAL)))
  (VALUES SUM CLUSTERS TOTAL))

(DEFUN PRINT-OBJECTS-IN-REGION (REGION-NUMBER &OPTIONAL AFTER-ADDRESS)
  (LET ((AREA-NUMBER (REGION-AREA REGION-NUMBER)))
    (MAP-OVER-OBJECTS-IN-REGION REGION-NUMBER
      #'(LAMBDA (ADDRESS HEADER LEADER SIZE)
	  LEADER
	  (WHEN (OR (NULL AFTER-ADDRESS)
		    ( ADDRESS AFTER-ADDRESS))
	    ;; Kludge generic functions.
	    (WHEN (AND (LISTP HEADER)
		       (EQL SIZE (DEFSTORAGE-SIZE GENERIC-FUNCTION))
		       (EQL AREA-NUMBER FLAVOR::*FLAVOR-STATIC-AREA*))
	      (SETQ HEADER (%MAKE-POINTER DTP-GENERIC-FUNCTION HEADER)))
	    (FORMAT T #+3600 "~%~O:  ~S" #+IMach "~%~\SI:ADDRESS\:  ~S"
		    ADDRESS HEADER))))))

(DEFUN PRINT-OBJECTS-AROUND (OBJECT &OPTIONAL (N 5))
  (FLET ((FIND-STRUCTURE-EXTENT (POINTER)
	   (MULTIPLE-VALUE-BIND (HEADER LEADER SIZE)
	       (%FIND-STRUCTURE-EXTENT POINTER)
	     ;; Kludge generic functions.
	     (WHEN (AND (LISTP HEADER)
			(EQL SIZE (DEFSTORAGE-SIZE GENERIC-FUNCTION))
			(EQL (%AREA-NUMBER HEADER) FLAVOR::*FLAVOR-STATIC-AREA*))
	       (SETQ HEADER (%MAKE-POINTER DTP-GENERIC-FUNCTION HEADER)))
	     (VALUES HEADER LEADER SIZE))))
    (MULTIPLE-VALUE-BIND (HEADER LEADER SIZE)
	(FIND-STRUCTURE-EXTENT OBJECT)
      (LET* ((REGION (%REGION-NUMBER LEADER))
	     (REGION-ORIGIN (REGION-ORIGIN REGION))
	     (REGION-LIMIT (+ REGION-ORIGIN (REGION-FREE-POINTER REGION))))
	(LABELS ((RECURSE-BACK (HEADER LEADER N)
		   (WHEN (AND (PLUSP N) (%POINTER-LESSP REGION-ORIGIN LEADER))
		     (%MULTIPLE-VALUE-CALL-N RECURSE-BACK
		       (FIND-STRUCTURE-EXTENT (%MAKE-POINTER-OFFSET DTP-LOCATIVE LEADER -1)) 2
		       (1- N) 1))
		   (FORMAT T #+3600 "~%~O:  ~S" #+IMach "~%~\SI:ADDRESS\:  ~S"
			   (%POINTER LEADER) HEADER)))
	  (FORMAT T "~&~S, region ~O" (AREA-NAME (REGION-AREA REGION)) REGION)
	  (RECURSE-BACK HEADER LEADER N)
	  (LOOP REPEAT N 
		WHILE (> (%POINTER-DIFFERENCE REGION-LIMIT LEADER) SIZE)
		DO (MULTIPLE-VALUE (HEADER LEADER SIZE)
		     (FIND-STRUCTURE-EXTENT (%MAKE-POINTER-OFFSET DTP-LOCATIVE LEADER SIZE)))
		   (FORMAT T #+3600 "~%~O:  ~S" #+IMach "~%~\SI:ADDRESS\:  ~S"
			   (%POINTER LEADER) HEADER)))))))