[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ICMP problems
Date: Wed, 19 Feb 1992 18:13 EST
The following concerns some problems with the Symbolics' implementation of
the ICMP portion of IP-TCP. I'm looking for anyone who might have had (and
solved) similar problems.
The code with problems is in the file SYS:IP-TCP;ICMP.LISP.2624 and is the
implementation of such things as ICMP-SEND-ECHO (the host pinger) and
SEND-MISLEADING-REDIRECT (an answer to a ping that says try another place
:SEND-MISLEADING-REDIRECT is completely unused. I think it's some kind
of debugging tool, to allow you to manually send an ICMP Redirect
message. What did you base your description of it on?
The redirect message can be generated by a host that KNOWS your routing is wrong and
wants to tell you what it thinks is right.
ICMP messages are sent with a sequence number that comes back
in the reply. The Symbolics implementation increments this number without
inhibiting process switching. Ergo, it is possible for two processes on
the same host to originate pings with the same sequence number.
Yes, that would be a problem if you were pinging simultaneously from two
processes. Why would you do that?
I am doing that. I have a background process for each host of interest. These
processes are trying to keep track of state data for its assigned host. The code
for doing it this way is trivial compared to putting it all in one place.
Actually, a somewhat more serious synchronization problem is in the
:ECHO-REPLY and :GET-ECHO-REPLY methods. It's possible for *both*
pingers to see the first reply and both of them would then try to remove
it from the list of replies, possibly screwing up royally due to the use
of the destructive DELETE function. In fact, this problem exists even
without the common sequence numbers; it could happen any time two ping
responses are received close together.
I believe that the ping receiver actually looks for the return of its suppossedly
unique seauence number. The variable, ADDRESS, in :echo-reply is actually a
sequence number. You are correct that both could do delete.
Further, the receipt of a ping only looks to see if the same sequence number
is in a return packet. This causes a problem since the answer to a ping
might be a SEND-MISLEADING-REDIRECT from a host other than the one that
was pinged. In other words, the Symbolics receiver ignores message type.
That's completely wrong. Look at (FLAVOR:METHOD :RECEIVE-IP-PACKET
ICMP-PROTOCOL). After validating the checksum, the first thing it does
is dispatch on the message type.
What's true is that it ignores the source address, so if you're pinging
two hosts simultaneously, responses from one host might be received by
the pinger of the other host.
Only if the sequence numbers are the same. (See above comment.)
For both of the above reasons, I am getting positive responses to pings from
hosts that haven't been powered up in months.
I don't see how duplicate sequence numbers can have this result. Not
checking the ICMP type could, but it doesn't have that bug.
Since the address is really the sequence number, this happens.
It seems that the proper fix is to (1) incapsulate the sequence number
incrementation against process switching and (2) make the receiver use
redirect data to update route tables (optional) but NIL the ping response.
It does (2); here's the section of (FLAVOR:METHOD :RECEIVE-IP-PACKET
ICMP-PROTOCOL) that does it:
(5 (send network :icmp-redirect source
(neti:get-sub-packet icmp 'sys:art-8b icmp-size)
(load-internet-address icmp 4)
(case (icmp-code icmp)
((0 2) nil)
((1 3) t))))
A. Does anybody know if Symbolics is (has/will) dealt with these problems?
Only Symbolics can answer that.
B. Has anybody else fixed this stuff? (If so, will you share?)
C. If answers to A and B are negative, I will try to fix this myself.
Does anyone know the code well enough to guess that the only code that
needs to be fixed is in the file SYS:IP-TCP;ICMP.LISP.2624?
I just implemented the necessary fixes, and they seem to work (but I
wasn't able to get the original code to fail -- I guess my 3650 doesn't
process switch often enough to lose). In (FLAVOR:METHOD :ICMP-SEND-ECHO
I'm seing the problem on a 3650 with 11 megawords. There are approximately 30
processes doing pinging at a rate of once a second if the ping is answered. If
the ping times out, there is additional delay.
(if (< *icmp-echo-sequence* 65535)
(setq *icmp-echo-sequence* 0))
(if (< x 65535)
and replace "push" with "process:atomic-push".
And in (FLAVOR:METHOD :GET-ECHO-REPLY ICMP-PROTOCOL), replace:
(setq echoes-outstanding (delete echo echoes-outstanding))
#'(lambda (x) (delete echo x)))
Since these changes guarantee that sequence numbers are unique (unless
you send 64K pings, so that the sequence number wraps around, before
looking for any replies), the problem of not checking the source address
I understand that these are the kinds of changes that are necessary. The question
is whether there is stuff in other files besides SYS:IP-TCP;ICMP.LISP.2624 that
needs to be changed.
If you'd like, I also have changes to the ICMP echo code that returns
the response time in microseconds (the original code records the
reception time in 60ths of a second, but doesn't return that information
to the caller). I also have a Ping CP command that is similar to the
Unix ping command.
Thanks for the help todate.