-
Type: Bug
-
Status: Open (View Workflow)
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:
After BM test-station/Mininet server was upgraded to Ubuntu 16.04 recently we started to see an increased latency in switch discovery test between last TCP handshake message and OVS sending `OFPT_HELLO` message to ONOS
Most of the time it's taking over 50ms before OVS to send `OFPT_HELLO`
235 8.252942566 10.192.19.69 → 10.192.19.68 TCP 74 55820 → 6653 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=84287833 TSecr=0 WS=512 8.253121904 10.192.19.68 → 10.192.19.69 TCP 74 6653 → 55820 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1214190822 TSecr=84287833 WS=128 8.253141636 10.192.19.69 → 10.192.19.68 TCP 66 55820 → 6653 [ACK] Seq=1 Ack=1 Win=29696 Len=0 TSval=84287833 TSecr=1214190822 8.310037829 10.192.19.69 → 10.192.19.68 OpenFlow 82 Type: OFPT_HELLO
OVS logs also confirm the delay
2019-06-27T22:14:48.531Z|80342|connmgr|INFO|s3: added primary controller "tcp:10.192.19.68:6653"
2019-06-27T22:14:48.531Z|80343|rconn|INFO|s3<->tcp:10.192.19.68:6653: connecting...
2019-06-27T22:14:48.531Z|80344|rconn|DBG|s3<->tcp:10.192.19.68:6653: entering CONNECTING
... some other logs ...
2019-06-27T22:14:48.586Z|80348|vconn|DBG|tcp:10.192.19.68:6653: sent (Success): OFPT_HELLO (OF1.3) (xid=0xfe6):
It’s happening on all ONOS branches which seems to confirm that it’s coming from the infrastructure. Looks like the Ubuntu upgrade further slowed down OVS a little. But even before the upgrade, it’s much slower than OVS on VMs.
The current test flow is `ovs-vsctl set-controller` for switch-up and then `ovs-vsctl del-controller` for switch-down and repeat that. The large delay of sending `OFPT_HELLO` starts from the 2nd iteration. And if I change the flow to `switch s3 start` -> `ovs-vsctl set-controller` -> `switch s3 stop`. The delay goes away and every time it sends HELLO in `1ms`.
However it's unclear why using `ovs-vsctl del-controller` without restarting the switch slows down OVS next time it connects to ONOS as well as why it’s not the case on VMs.
So since this delay is really not controlled by ONOS, possible solutions are
- Ignore the regression as it’s not really an ONOS issue, or
- Change how we test the latency by 1) changing `tcp_to_feature_reply` to `hello_to_feature_reply` which starts from OVS sending the `HELLO` and removes TCP handshake (which basically happens within 1ms) and the wait time for OVS to send `HELLO` or 2) add `switch start` and `switch stop` into the test flow which will also get rid of the weird OVS waiting time.
# | Subject | Branch | Project | Status | CR | V |
---|---|---|---|---|---|---|
22884,1 | Expect larger switch up latency due to ONOS-8006 | master | OnosSystemTest | Status: MERGED | +2 | +1 |
22885,1 | Expect larger switch up latency due to ONOS-8006 | onos-2.2 | OnosSystemTest | Status: MERGED | +2 | +1 |
22886,1 | Expect larger switch up latency due to ONOS-8006 | onos-1.15 | OnosSystemTest | Status: MERGED | +2 | +1 |