[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ICMP problems



    
        Date: Wed, 19 Feb 1992 18:13 EST
        From: jbarnett@charming.nrtc.northrop.com
    
        The following concerns some problems with the Symbolics' implementation of
        the ICMP portion of IP-TCP.  I'm looking for anyone who might have had (and
        solved) similar problems.
    
        The code with problems is in the file SYS:IP-TCP;ICMP.LISP.2624 and is the
        implementation of such things as ICMP-SEND-ECHO (the host pinger) and
        SEND-MISLEADING-REDIRECT (an answer to a ping that says try another place
        instead).  
    
    :SEND-MISLEADING-REDIRECT is completely unused.  I think it's some kind
    of debugging tool, to allow you to manually send an ICMP Redirect
    message.  What did you base your description of it on?

The redirect message can be generated by a host that KNOWS your routing is wrong and
wants to tell you what it thinks is right.
    
    	       ICMP messages are sent with a sequence number that comes back
        in the reply.  The Symbolics implementation increments this number without
        inhibiting process switching.  Ergo, it is possible for two processes on
        the same host to originate pings with the same sequence number.
    
    Yes, that would be a problem if you were pinging simultaneously from two
    processes.  Why would you do that?

I am doing that.  I have a background process for each host of interest.  These
processes are trying to keep track of state data for its assigned host.  The code
for doing it this way is trivial compared to putting it all in one place.
    
    Actually, a somewhat more serious synchronization problem is in the
    :ECHO-REPLY and :GET-ECHO-REPLY methods.  It's possible for *both*
    pingers to see the first reply and both of them would then try to remove
    it from the list of replies, possibly screwing up royally due to the use
    of the destructive DELETE function.  In fact, this problem exists even
    without the common sequence numbers; it could happen any time two ping
    responses are received close together.

I believe that the ping receiver actually looks for the return of its suppossedly
unique seauence number.  The variable, ADDRESS, in :echo-reply is actually a
sequence number.  You are correct that both could do delete.
    
        Further, the receipt of a ping only looks to see if the same sequence number
        is in a return packet.  This causes a problem since the answer to a ping
        might be a SEND-MISLEADING-REDIRECT from a host other than the one that
        was pinged.  In other words, the Symbolics receiver ignores message type.
    
    That's completely wrong.  Look at (FLAVOR:METHOD :RECEIVE-IP-PACKET
    ICMP-PROTOCOL).  After validating the checksum, the first thing it does
    is dispatch on the message type.

You're correct.

    What's true is that it ignores the source address, so if you're pinging
    two hosts simultaneously, responses from one host might be received by
    the pinger of the other host.

Only if the sequence numbers are the same. (See above comment.)
    
        For both of the above reasons, I am getting positive responses to pings from
        hosts that haven't been powered up in months.
    
    I don't see how duplicate sequence numbers can have this result.  Not
    checking the ICMP type could, but it doesn't have that bug.

Since the address is really the sequence number, this happens.
    
        It seems that the proper fix is to (1) incapsulate the sequence number
        incrementation against process switching and (2) make the receiver use
        redirect data to update route tables (optional) but NIL the ping response.
    
    It does (2); here's the section of (FLAVOR:METHOD :RECEIVE-IP-PACKET
    ICMP-PROTOCOL) that does it:
    
    		 ;;Redirect
    		 (5 (send network :icmp-redirect source
    			  (neti:get-sub-packet icmp 'sys:art-8b icmp-size)
    			  (load-internet-address icmp 4)
    			  (case (icmp-code icmp)
    			    ((0 2) nil)
    			    ((1 3) t))))
    
Right.

        A. Does anybody know if Symbolics is (has/will) dealt with these problems?
    
    Only Symbolics can answer that.
    
        B. Has anybody else fixed this stuff?  (If so, will you share?)
        C. If answers to A and B are negative, I will try to fix this myself.
           Does anyone know the code well enough to guess that the only code that
           needs to be fixed is in the file SYS:IP-TCP;ICMP.LISP.2624?
    
    I just implemented the necessary fixes, and they seem to work (but I
    wasn't able to get the original code to fail -- I guess my 3650 doesn't
    process switch often enough to lose).  In (FLAVOR:METHOD :ICMP-SEND-ECHO
    ICMP-PROTOCOL), replace:

I'm seing the problem on a 3650 with 11 megawords.  There are approximately 30
processes doing pinging at a rate of once a second if the ping is answered. If
the ping times out, there is additional delay.
    
    (if (< *icmp-echo-sequence* 65535)
        (incf *icmp-echo-sequence*)
        (setq *icmp-echo-sequence* 0))
    
    with
    
    (process:atomic-updatef *icmp-echo-sequence*	
    			#'(lambda (x)
    			    (if (< x 65535)
    				(1+ x)
    				0)))
    
    and replace "push" with "process:atomic-push".
    
    And in (FLAVOR:METHOD :GET-ECHO-REPLY ICMP-PROTOCOL), replace:
    
    (setq echoes-outstanding (delete echo echoes-outstanding))
    
    with
    
    (process:atomic-updatef echoes-outstanding
    			#'(lambda (x) (delete echo x)))
    
    Since these changes guarantee that sequence numbers are unique (unless
    you send 64K pings, so that the sequence number wraps around, before
    looking for any replies), the problem of not checking the source address
    is moot.

I understand that these are the kinds of changes that are necessary.  The question
is whether there is stuff in other files besides SYS:IP-TCP;ICMP.LISP.2624 that
needs to be changed.
    
    If you'd like, I also have changes to the ICMP echo code that returns
    the response time in microseconds (the original code records the
    reception time in 60ths of a second, but doesn't return that information
    to the caller).  I also have a Ping CP command that is similar to the
    Unix ping command.
    
                                                    barmar
   
Thanks for the help todate.

Jeff