[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CLX Display Lock in Multiprocessing Environment (Allegro CL)

We're developing a simulation environment for cooperating expert systems which
makes heavy use of multiprocessing. From time to time the program runs into an 
error that blocks several processes with a state of "CLX Display Lock". The
processes that are blocked perform I/O to X Windows for animation and user 
interface event handling. Though the error is reproducable under certain 
conditions and we repeatedly tried to trap it by inspecting the processes after 
they blocked, we're not sure what caused it nor what the "CLX Display Lock" 
state denotes.

We're using SUN 4 and SparcStations, SUN OS 4.1.1, X11R4 and Allegro CL 4.0.1 
with TI CLX (Common Lisp - X Windows) interface.

To be more precise: After some user interaction a certain part of the display 
gets freezed, indicating the well known "CLX Display Lock". The problem shows 
no finite behavior: sometimes 3 attempts are needed to reproduce it, sometimes 
30. Output of the top-level :processes command looks roughly like this:

	[1c] <cl:USER> :pro
	"Simulation Clock" is CLX Display Lock.
	"Process-1" is active.
	"Process-2" is active.
	"Process-3" is active.
	"Process-4" is Servicing a Keyboard interrupt signal.
	"Simulation Process" is CLX Display Lock.
	"Event-Handling Process" is CLX Display Lock.
	"TCP Listener Socket Daemon" is waiting for a connection.
	"Initial Lisp Listener" is waiting for terminal input.

Also, not always the same processes are affected, but usually those which
perform I/O to the window system. 

Inspecting the locked processes shows slot WAIT-FUNCTION bound to #<Function 
(:INTERNAL MP::PROCESS-LOCK-1 0) @ ...> and slot WAIT-ARGS bound to a list of 2 
elements which are usually the process itself and a MP:PROCESS-LOCK structure 
instance. Inspecting the PROCESS-LOCK shows the slots NAME ("CLX Buffer Lock"), 
LOCKER (usually NIL), and WAITING (a list of other processes - often one of 
them appearing twice in the list).

Am I getting it right - Seems to be kind of a deadlock, with the lock seized to 
provide exclusive access to some resource, and the processes in the WAITING 
list waiting for the lock to become free, therefore they're in the state of
"CLX Display Lock". 
But, then, why does it happen? What is the resource that needs exclusive access?
The display (X server)? A buffer? Which buffer? How do we have to interpret 
PROCESS-LOCK-LOCKER being NIL? A lock that some processes are waiting for but 
which no process has seized?

The CLX functions which are on top of the evaluation stacks of the affected 
processes don't look problematic [e.g. (xlib:draw-line ...), (xlib::wait-for-
event ...) or (xlib::change-window-attribute ...], so we wouldn't expect them 
to cause the problem.

We've been trying several ways to protect segments of code from interleaving
execution with other processes. With or without mp:without-scheduling, with or
without mp:with-process-lock, the error didn't disappear. 
We've also been putting an explicit xlib:with-display around a critical piece 
of code, with minor effects: The process that's been twice in the WAITING list 
didn't lock anymore (but only this one), instead it appeared in the slot
PROCESS-LOCK-LOCKER for those processes that still did lock. Doing this with
other processes also threw us back to where we were before (the three processes
doing window system I/O were "CLX Display Lock" again).

Any clues, somebody?

BTW: Does somebody know about any literature on the CLX interface (not X 
Windows) other than the "CLX Programmer's Reference"?


Olaf Schreck				Daimler Benz Research Institute Berlin
email: schreck@b21.uucp       or        ...!mcsun!unido!b21!schreck (overseas)