Uploaded image for project: 'ONOS'
  1. ONOS
  2. ONOS-8006

Increased switch discovery latency




      After BM test-station/Mininet server was upgraded to Ubuntu 16.04 recently we started to see an increased latency in switch discovery test between last TCP handshake message and OVS sending `OFPT_HELLO` message to ONOS

      Most of the time it's taking over 50ms before OVS to send `OFPT_HELLO`

       235 8.252942566 → TCP 74 55820 → 6653 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=84287833 TSecr=0 WS=512
       8.253121904 → TCP 74 6653 → 55820 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1214190822 TSecr=84287833 WS=128
       8.253141636 → TCP 66 55820 → 6653 [ACK] Seq=1 Ack=1 Win=29696 Len=0 TSval=84287833 TSecr=1214190822
       8.310037829 → OpenFlow 82 Type: OFPT_HELLO

      OVS logs also confirm the delay

      2019-06-27T22:14:48.531Z|80342|connmgr|INFO|s3: added primary controller "tcp:"
      2019-06-27T22:14:48.531Z|80343|rconn|INFO|s3<->tcp: connecting...
      2019-06-27T22:14:48.531Z|80344|rconn|DBG|s3<->tcp: entering CONNECTING
      ... some other logs ...
      2019-06-27T22:14:48.586Z|80348|vconn|DBG|tcp: sent (Success): OFPT_HELLO (OF1.3) (xid=0xfe6):

      It’s happening on all ONOS branches which seems to confirm that it’s coming from the infrastructure. Looks like the Ubuntu upgrade further slowed down OVS a little. But even before the upgrade, it’s much slower than OVS on VMs.

      The current test flow is `ovs-vsctl set-controller` for switch-up and then `ovs-vsctl del-controller` for switch-down and repeat that. The large delay of sending `OFPT_HELLO` starts from the 2nd iteration. And if I change the flow to `switch s3 start` -> `ovs-vsctl set-controller` -> `switch s3 stop`. The delay goes away and every time it sends HELLO in `1ms`.

      However it's unclear why using `ovs-vsctl del-controller` without restarting the switch slows down OVS next time it connects to ONOS as well as why it’s not the case on VMs.

      So since this delay is really not controlled by ONOS, possible solutions are

      • Ignore the regression as it’s not really an ONOS issue, or
      • Change how we test the latency by 1) changing `tcp_to_feature_reply` to `hello_to_feature_reply` which starts from OVS sending the `HELLO` and removes TCP handshake (which basically happens within 1ms) and the wait time for OVS to send `HELLO` or 2) add `switch start` and `switch stop` into the test flow which will also get rid of the weird OVS waiting time.


        # Subject Branch Project Status CR V



            You You Wang (Inactive)
            0 Vote for this issue
            1 Start watching this issue