[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bil Lewis <BIL@su-sierra.arpa>: [Bil Lewis <BIL@SU-SIERRA.ARPA>: Array references in CL]]



[ Looks like this MSG got stuck here at Stanford.  Hope I'm not
repeating it.]

}i-------
                ---------------

Return-Path: <@SRI-AI.ARPA,@R20.UTEXAS.EDU:BIL@su-sierra.arpa>
Received: from SRI-AI.ARPA by su-sierra.arpa with TCP; Fri 20 Jun 86 01:03:57-PDT
Received: from R20.UTEXAS.EDU by SRI-AI.ARPA with TCP; Tue 17 Jun 86 11:26:39-PDT
Received: from su-sierra.arpa by R20.UTEXAS.EDU with TCP; Tue 17 Jun 86 12:50:45-CDT
Date: Tue 17 Jun 86 10:48:07-PDT
From: Bil Lewis <BIL@su-sierra.arpa>
Subject: [Bil Lewis <BIL@SU-SIERRA.ARPA>: Array references in CL]
To: SLUG@R20.UTEXAS.EDU
Message-ID: <12215580557.30.BIL@su-sierra.arpa>

Mail-From: BIL created at 17-Jun-86 10:45:18
Date: Tue 17 Jun 86 10:45:17-PDT
From: Bil Lewis <BIL@SU-SIERRA.ARPA>
Subject: Array references in CL
To: bil@SU-SIERRA.ARPA
Message-ID: <12215580040.30.BIL@su-sierra.arpa>




	On the question of optimal array access in CL:

During SLUG, I took Dave Plumber aside to demonstrate the
array-thrashing problem about which so much has been said.  I'd tried
a test at home & didn't like the results (i.e., no difference).  In
as much as it took Dave a couple of trials before he (& I) really
realized what was going on, I thought it useful to mention the
results I have gotten subsequently.  If he & I can make that mistake,
so can others.

The basic idea is that if you have an array which is larger than your
available physical memory (taking into account other data & program
you might require simultaniously), then if you access elements in the
wrong order you MAY thrash.

Consider the worst case: On my machine with 1m word physical (4mb), I
create an array 256 x 4096 = 1mW.  Please notice that a swapping page
is exactly 256 36-bit words (32 bits data + 4 bits of tag).  I am
also assuming that I have 4096 pages of physical, which may not be
correct (I don't know how the tag bits are counted), but the argument
remains correct even if that number is slightly off.

		Array layout in Virtual

-----------	-----------	-----------	-----------	
|  page 0 |	|  page 1 |	|  page 2 |	|  page 3 |	...page 4095
-----------	-----------	-----------	-----------	
|ele 0,0  |	|ele 0,1  |	|ele 0,2  |	|ele 0,3  |	
| ...     |	| ...     |	| ...     |	| ...     |	
|ele 255,0|	|ele 255,1|	|ele 255,2|	|ele 255,3|	
-----------	-----------	-----------	-----------	


When I tell my machin}ie to access this array by column, it touches
element 0,0 then 0,1 etc.  By the time it's gotten to element 0,4095
the entire array has been paged in.  Pity of the matter is that if I
need just one page for my code (actually a good number are wired in,
around 500?), then to page in page 4095, it has to page OUT page 0.
Which is the very next page to be touched. So we thrash. BADLY.  [In
the table below, over 1e+6 page faults occured!]

The trick to this (the trick both Dave & I missed), is that if we}i'd
choosen an array with different dimensions, or if the page size had
been different, we could have gotten totally different results.  When
I change the array dimensions to 512x2048 I get a mere 4,104 faults;
when I create a square array, 1000x1000, it's only 3,764.

The paging algorythm used on the Lispms is a version of LRU (I am
told).  It's really more like "Not-very-recently-used," although for
this particular example it comes out to exactly LRU.  

It runs: Mark a newly paged-in page "2," and start a "paging-pointer"
from Physical 0.  When another page is needed, scan through physical
with the "paging-pointer," looking for a page marked "0" and replace
that one.  While scanning, decrement the counter on each page passed
over.  Continue round-robin.  

I don't know if any special respect is shown for "dirty" pages. [I'd
love to know for the sake of knowing.  If anyone would like to
volunteer that information...]

The conclusion is that IF your array exceeds physical memory, AND the
number of columns is greater than the number of PAGES of physical,
AND you sequentially address all elements varying column first, THEN
you will thrash.  Addressing an array row first never loses.

A final point remains a mystery to me and I would love it if someone
could explain it to me (for the sake of understanding):  Notice in
the table that the array 256x3500 suffered 0 faults.  The array
256x3600 suffered around 3,600; 256x3700 around 3,700; but 256x3800
thrashed (greater than 50,000 faults, perhaps 800,000).  Why?

-Bil

				DATA

Access by
Column/
Row	Array Size		 -------- TIME -------  ------ Averages ------
  Elements Dimensions #Faults   Total  Faults   CPU     CPU     #Faults	 Total

C 1e+4   1e+2, 1e+2   0	 	9.2e-2	0	9.2e-2	9.2e-6	0	9.2e-6
R 1e+4   1e+2, 1e+2   0	 	9.2e-2	0	9.2e-2	9.2e-6	0	9.2e-6
C 1e+6   1e+3, 1e+3   3,764	5.9e+1  4.3e+1	1.6e+1	1.6e-5	3.7e-3  5.9e-5
R 1e+6   1e+3, 1e+3   1,115	2.9e+1	1.4e+1	1.5e+1	1.5e-5	1.1e-3  2.9e-5
C 4e+6   2e+3, 2e+3   15,633	2.4e+2  1.8e+2  6.0e+1  1.5e-5	4.0e-3  4.0e-5
R 4e+6   2e+3, 2e+3   15,225	2.5e+2  1.8e+2  6.0e+1  1.5e-5	4.0e-3  4.0e-5
C 1e+6   1e+1, 1e+5   39,041	6.5e+2	6.2e+2	3.0e+1	3.0e-5	3.9e-2  6.5e-4
R 1e+6   1e+1, 1e+5   3,914	6.6e+1	4.9e+1	1.7e+1	1.7e-5  3.9e-3  6.6e-5
C 1e+6   1e+5, 1e+1   2,479	6.6e+1	4.9e+1  1.7e+1	1.7e-5  2.5e-3  6.6e-5
R 1e+6   1e+5, 1e+1   1,421	3.4e+1	1.9e+1  1.5e+1	1.5e-5  1.4e-3  3.4e-5
C 1e+6   256,  4096   1e+6	1.6e+4  1.6e+4  1.8e+2? 1.8e-4  1       1.6e-2
R 1e+6   256,  4096   4,104	6.9e+1  5.2e+1  1.7e+1  1.7e-5  4.1e-3  6.9e-5
C 1e+6   4096,  256   4,104	6.5e+1  4.8e+1  1.7e+1  1.7e-5  4.1e-3  6.5e-5
R 1e+6   4096,  256   3,891	6.9e+1  5.2e+1  1.7e+1  1.7e-5  3.9e-3  6.9e-5
C 7e+5   256,  3000   0	 	1.2e+1	0	1.2e+1  1.8e-5  0	1.8e-5
R 7e+5   256,  3000   0	 	1.2e+1	0	1.2e+1  1.8e-5  0	1.8e-5
C 7e+5   256,  3500   0
R 7e+5   256,  3500   0
C 7e+5   256,  3600   3,600
R 7e+5   256,  3600   3,600
C 7e+5   256,  3800   thrash	>1e+3
R 7e+5   256,  3800   untested {_(& uninteresting)
C 7e+5   3000,  256   0	 	1.2e+1	0	1.2e+1  1.8e-5  0  	1.8e-5
R 7e+5   3000,  256   0	 	1.2e+1	0	1.2e+1  1.8e-5  0	1.8e-5
C 1e+6   512,	2048  4,104	8.7e+1  7.0e+1  1.7e+1  1.7e-5  4.1e-3	8.7e-5
R 1e+6   512,   2048  2,048	3.5e+1  1.9e+1  1.6e+1  1.6e-5  2.0e-3	3.5e-5



			The (basic) function

(let ((ARRAY (make-array `(,ROW ,COL))))
    (without-interrupts 
      (time
	(loop for I from 0 below ROW do
	      (loop for J from 0 below COL do
		    (aref ARRAY I J))))
--------

-------
-------
-------