[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
CLX Display Lock in Multiprocessing Environment (Allegro CL)
We're developing a simulation environment for cooperating expert systems which
makes heavy use of multiprocessing. From time to time the program runs into an
error that blocks several processes with a state of "CLX Display Lock". The
processes that are blocked perform I/O to X Windows for animation and user
interface event handling. Though the error is reproducable under certain
conditions and we repeatedly tried to trap it by inspecting the processes after
they blocked, we're not sure what caused it nor what the "CLX Display Lock"
state denotes.
We're using SUN 4 and SparcStations, SUN OS 4.1.1, X11R4 and Allegro CL 4.0.1
with TI CLX (Common Lisp - X Windows) interface.
To be more precise: After some user interaction a certain part of the display
gets freezed, indicating the well known "CLX Display Lock". The problem shows
no finite behavior: sometimes 3 attempts are needed to reproduce it, sometimes
30. Output of the top-level :processes command looks roughly like this:
[1c] <cl:USER> :pro
"Simulation Clock" is CLX Display Lock.
"Process-1" is active.
"Process-2" is active.
"Process-3" is active.
"Process-4" is Servicing a Keyboard interrupt signal.
"Simulation Process" is CLX Display Lock.
"Event-Handling Process" is CLX Display Lock.
"TCP Listener Socket Daemon" is waiting for a connection.
"Initial Lisp Listener" is waiting for terminal input.
Also, not always the same processes are affected, but usually those which
perform I/O to the window system.
Inspecting the locked processes shows slot WAIT-FUNCTION bound to #<Function
(:INTERNAL MP::PROCESS-LOCK-1 0) @ ...> and slot WAIT-ARGS bound to a list of 2
elements which are usually the process itself and a MP:PROCESS-LOCK structure
instance. Inspecting the PROCESS-LOCK shows the slots NAME ("CLX Buffer Lock"),
LOCKER (usually NIL), and WAITING (a list of other processes - often one of
them appearing twice in the list).
Am I getting it right - Seems to be kind of a deadlock, with the lock seized to
provide exclusive access to some resource, and the processes in the WAITING
list waiting for the lock to become free, therefore they're in the state of
"CLX Display Lock".
But, then, why does it happen? What is the resource that needs exclusive access?
The display (X server)? A buffer? Which buffer? How do we have to interpret
PROCESS-LOCK-LOCKER being NIL? A lock that some processes are waiting for but
which no process has seized?
The CLX functions which are on top of the evaluation stacks of the affected
processes don't look problematic [e.g. (xlib:draw-line ...), (xlib::wait-for-
event ...) or (xlib::change-window-attribute ...], so we wouldn't expect them
to cause the problem.
We've been trying several ways to protect segments of code from interleaving
execution with other processes. With or without mp:without-scheduling, with or
without mp:with-process-lock, the error didn't disappear.
We've also been putting an explicit xlib:with-display around a critical piece
of code, with minor effects: The process that's been twice in the WAITING list
didn't lock anymore (but only this one), instead it appeared in the slot
PROCESS-LOCK-LOCKER for those processes that still did lock. Doing this with
other processes also threw us back to where we were before (the three processes
doing window system I/O were "CLX Display Lock" again).
Any clues, somebody?
BTW: Does somebody know about any literature on the CLX interface (not X
Windows) other than the "CLX Programmer's Reference"?
Thanx.
================================================================================
Olaf Schreck Daimler Benz Research Institute Berlin
email: schreck@b21.uucp or ...!mcsun!unido!b21!schreck (overseas)