[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interrupt bug



Rob.MacLachlan@LISP-PMAX2.SLISP.CS.CMU.EDU writes:
> I have found a reproducible bug with the Hemlock OOB process control stuff.
> This might possibly relate to Scott's problem with slaves dying on
> interrupts.
> 
> To reproduce:
>     (without-interrupts (loop))
> 
> Then do a c-c b, then "kill -ILL" the slave from a shell.  When you quit
> from the break loop, you will get an error in the without-interrupts
> unwind-protect when it is trying to handle the OOB:
>     Error in function EXTENSIONS::SIGURG-HANDLER.
>     Error recving oob data on 6: Invalid argument
> 
> Maybe this is just because the OOB data is being lost due to the other I/O
> on the socket.  If so, it would be nice to handle this condition more
> gracefully.
> 
> But actually, I wouldn't be doing this "kill -ILL" shit if it weren't for
> the fact that the system often gets wedged so that it can't be interrupted
> normally.  In fact, I wonder if there isn't a problem with
> without-interrupts, since this is uninterruptable:
>     (loop (without-interrupts (dotimes (i 1000))))
> 
>   Rob

If you hit the lisp process with a signal while it's in a without
interrupts, it sets the signal mask to block additional signals until
it handles the pending signal at the end of the without-interrupts.
Unfortunatly, it resets the sigmask *after* running the signal
handler, so if that signal handler throws to top level, the signal
mask never gets reset.  I haven't fixed this yet, 'cause I'm not sure
what the semantics should be.  I'll work harder on it though.

Fixing this might also fix the OOB stuff, but I'm not sure.  If it
doesn't, I'll look into it.

-William