-
Type:
Bug
-
Status: Open (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 1.9.0
-
Fix Version/s: None
-
Component/s: None
-
Labels:
-
Environment:
3 VMs, 8GB RAM, 4 CPU cores per VM, ONOS 1.9.0, VPLS app, 3 HW switches, 3 connect points, 3x150 hosts
-
Epic Link:
Hello
I am testing vpls application in ONOS 3 nodes cluster. Each node is VM 8GB 4 cores. And I have 3 connect points on 3 physical switches. While flooding traffic for 450 hosts (3x150) cluster has a lot of errors in the log like the follofing:
2017-07-01 03:40:16,507 | WARN | 9876-partition-1 | DistributedFlowRuleStore | 129 - org.onosproject.onos-core-dist - 1.9.0 | Failed to backup devices: [of:5e3e7072cfc7bce2]. Reason: java.util.concurrent.TimeoutException: Timedout waiting for reply, Node: 192.168.122.51
2017-07-01 03:40:16,517 | WARN | 9876-partition-1 | DistributedFlowRuleStore | 129 - org.onosproject.onos-core-dist - 1.9.0 | Failed to backup devices: [of:5e3e7072cfc7bce2]. Reason: java.util.concurrent.TimeoutException: Timedout waiting for reply, Node: 192.168.122.51
2017-07-01 03:40:16,523 | WARN | ycat-client-io-1 | DistributedFlowRuleStore | 129 - org.onosproject.onos-core-dist - 1.9.0 | Failed to backup devices: [of:5e3e7072cfc7bce2]. Reason: java.util.concurrent.TimeoutException: Timedout waiting for reply, Node: 192.168.122.51
2017-07-01 03:40:16,541 | INFO | nos-topo-build-7 | TopologyManager | 127 - org.onosproject.onos-core-net - 1.9.0 | Topology DefaultTopology
changed
Besides this many of intents cannot be installed:
2017-07-01 00:58:30,947 | WARN | -event-barrier-1 | IntentInstaller | 127 - org.onosproject.onos-core-net - 1.9.0 | Failed installation operation for: VPLS1-uni-of:5e3e7072cfc7bce2-4-00:00:00:05:00:4D MultiPointToSinglePointIntent{id=0x331, key=VPLS1-uni-of:5e3e7072cfc7bce2-4-00:00:00:05:00:4D, appId=DefaultApplicationId
{id=187, name=net.gateflow.vpls}, priority=1000, resources=[], selector=DefaultTrafficSelector{criteria=[ETH_DST:00:00:00:05:00:4D]}, treatment=DefaultTrafficTreatment{immediate=[NOACTION], deferred=[], transition=None, meter=None, cleared=false, metadata=null}, ingress=[of:5e3e7072cfc7bdea/4, of:5e3e7072cfc7bd66/4], egress=of:5e3e7072cfc7bce2/4, filteredIngressCPs=[FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bdea/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}, FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bd66/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}], filteredEgressCP=FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bce2/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}, constraints=[PartialFailureConstraint], resourceGroup=null} due to org.onosproject.net.intent.impl.IntentInstaller$FlowRuleOperationContext$1@3e039e592017-07-01 00:58:30,947 | WARN | -event-barrier-1 | IntentInstaller | 127 - org.onosproject.onos-core-net - 1.9.0 | Failed withdrawal operation for: VPLS1-uni-of:5e3e7072cfc7bce2-4-00:00:00:05:00:4D MultiPointToSinglePointIntent{id=0x32d, key=VPLS1-uni-of:5e3e7072cfc7bce2-4-00:00:00:05:00:4D, appId=DefaultApplicationId{id=187, name=net.gateflow.vpls}
, priority=1000, resources=[], selector=DefaultTrafficSelector
{criteria=[ETH_DST:00:00:00:05:00:4D]}, treatment=DefaultTrafficTreatment
{immediate=[NOACTION], deferred=[], transition=None, meter=None, cleared=false, metadata=null}, ingress=[of:5e3e7072cfc7bdea/4, of:5e3e7072cfc7bd66/4], egress=of:5e3e7072cfc7bce2/4, filteredIngressCPs=[FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bdea/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}, FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bd66/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}], filteredEgressCP=FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bce2/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}, constraints=[PartialFailureConstraint], resourceGroup=null} due to org.onosproject.net.intent.impl.IntentInstaller$FlowRuleOperationContext$1@3e039e59
But no more detailes what the root cause is. I guess that there is a problem with storage sync in cluster mode. But I cannot understand how to troubleshoot it. Could anyone help me and turn me in a right direction? Thank you!
PS In singe node mode everything is ok on the same HW resources.