Uploaded image for project: 'ONOS'
  1. ONOS
  2. ONOS-6780

VPLS in 3 node cluster

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.9.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:

      3 VMs, 8GB RAM, 4 CPU cores per VM, ONOS 1.9.0, VPLS app, 3 HW switches, 3 connect points, 3x150 hosts

      Description

      Hello

      I am testing vpls application in ONOS 3 nodes cluster. Each node is VM 8GB 4 cores. And I have 3 connect points on 3 physical switches. While flooding traffic for 450 hosts (3x150) cluster has a lot of errors in the log like the follofing:

      2017-07-01 03:40:16,507 | WARN | 9876-partition-1 | DistributedFlowRuleStore | 129 - org.onosproject.onos-core-dist - 1.9.0 | Failed to backup devices: [of:5e3e7072cfc7bce2]. Reason: java.util.concurrent.TimeoutException: Timedout waiting for reply, Node: 192.168.122.51
      2017-07-01 03:40:16,517 | WARN | 9876-partition-1 | DistributedFlowRuleStore | 129 - org.onosproject.onos-core-dist - 1.9.0 | Failed to backup devices: [of:5e3e7072cfc7bce2]. Reason: java.util.concurrent.TimeoutException: Timedout waiting for reply, Node: 192.168.122.51
      2017-07-01 03:40:16,523 | WARN | ycat-client-io-1 | DistributedFlowRuleStore | 129 - org.onosproject.onos-core-dist - 1.9.0 | Failed to backup devices: [of:5e3e7072cfc7bce2]. Reason: java.util.concurrent.TimeoutException: Timedout waiting for reply, Node: 192.168.122.51
      2017-07-01 03:40:16,541 | INFO | nos-topo-build-7 | TopologyManager | 127 - org.onosproject.onos-core-net - 1.9.0 | Topology DefaultTopology

      {time=212810172311, creationTime=1498869616518, computeCost=404732, clusters=4, devices=5, links=3}

      changed

      Besides this many of intents cannot be installed:

      2017-07-01 00:58:30,947 | WARN | -event-barrier-1 | IntentInstaller | 127 - org.onosproject.onos-core-net - 1.9.0 | Failed installation operation for: VPLS1-uni-of:5e3e7072cfc7bce2-4-00:00:00:05:00:4D MultiPointToSinglePointIntent{id=0x331, key=VPLS1-uni-of:5e3e7072cfc7bce2-4-00:00:00:05:00:4D, appId=DefaultApplicationId

      {id=187, name=net.gateflow.vpls}, priority=1000, resources=[], selector=DefaultTrafficSelector{criteria=[ETH_DST:00:00:00:05:00:4D]}, treatment=DefaultTrafficTreatment{immediate=[NOACTION], deferred=[], transition=None, meter=None, cleared=false, metadata=null}, ingress=[of:5e3e7072cfc7bdea/4, of:5e3e7072cfc7bd66/4], egress=of:5e3e7072cfc7bce2/4, filteredIngressCPs=[FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bdea/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}, FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bd66/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}], filteredEgressCP=FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bce2/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}, constraints=[PartialFailureConstraint], resourceGroup=null} due to org.onosproject.net.intent.impl.IntentInstaller$FlowRuleOperationContext$1@3e039e59

      2017-07-01 00:58:30,947 | WARN | -event-barrier-1 | IntentInstaller | 127 - org.onosproject.onos-core-net - 1.9.0 | Failed withdrawal operation for: VPLS1-uni-of:5e3e7072cfc7bce2-4-00:00:00:05:00:4D MultiPointToSinglePointIntent{id=0x32d, key=VPLS1-uni-of:5e3e7072cfc7bce2-4-00:00:00:05:00:4D, appId=DefaultApplicationId{id=187, name=net.gateflow.vpls}

      , priority=1000, resources=[], selector=DefaultTrafficSelector

      {criteria=[ETH_DST:00:00:00:05:00:4D]}

      , treatment=DefaultTrafficTreatment

      {immediate=[NOACTION], deferred=[], transition=None, meter=None, cleared=false, metadata=null}

      , ingress=[of:5e3e7072cfc7bdea/4, of:5e3e7072cfc7bd66/4], egress=of:5e3e7072cfc7bce2/4, filteredIngressCPs=[FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bdea/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}, FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bd66/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}], filteredEgressCP=FilteredConnectPoint{connectPoint=of:5e3e7072cfc7bce2/4, trafficSelector=DefaultTrafficSelector{criteria=[VLAN_VID:300]}}, constraints=[PartialFailureConstraint], resourceGroup=null} due to org.onosproject.net.intent.impl.IntentInstaller$FlowRuleOperationContext$1@3e039e59

      But no more detailes what the root cause is. I guess that there is a problem with storage sync in cluster mode. But I cannot understand how to troubleshoot it. Could anyone help me and turn me in a right direction? Thank you!

      PS In singe node mode everything is ok on the same HW resources.

        Attachments

        1. node1.log
          369 kB
        2. node2.log
          289 kB
        3. node3.log
          290 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

            Assignee:
            luca Luca Prete
            Reporter:
            antmak Anton Makarov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated: