-
Type: Bug
-
Status: Closed (View Workflow)
-
Priority: Critical
-
Resolution: Done
-
Affects Version/s: 1.2.0, 1.2.1
-
Component/s: None
-
Labels:
-
Environment:
- Single instance with Mininet (OVS)
- Three instances with FSFW on Mininet (OVS)
-
Epic Link:
-
Sprint:Drake Sprint 2 (7/27-8/14) 2
Start a topology with loops and install some intents.
Disconnect all switches from controller (e.g. quit Mininet). Intents go to FAILED.
Reconnect all switches (e.g. start same Mininet topology). All Intents go to INSTALLED, but some flows are left in PENDING_ADD state permanently.
Digging deeper, when I see a flow in PENDING_ADD in ONOS, there is a flow in the switch with the same match but with a different treatment (e.g. a different output port). I'm thinking that somehow flows are left over from when the intent was first installed, and also when the topology comes back the intent is recalculated. If the recalculation selects a different path than the first path, you'll get this situation where there are two flows for the same switch with the same match but with a different treatment.
I think the fundamental cause here is we have two notions that we use to match flows: FlowRule.equals() and FlowId.equals().
FlowId.equals() depends on the flow treatment, because treatment is used as input to generate the FlowId. FlowRule.equals() does not depend on the treatment, it uses only the match and a few other things to determine equality.
The flow synchronization mechanism is unable to fix this problem because it mostly uses FlowRule.equals() when matching the flow_stats_reply with the FlowStore's state. It can't tell that two flows with the same match are actually different flows, and it simply updates statistics on the old flow entry but never installs the new flow entry that is stuck in PENDING_ADD (note for these flows no statistics are updated either).
(See FlowRuleManager#pushFlowMetrics - gets set of all stored rules, then iterates the flow_stats and removes from storeRules set. There are no stored rules left over, so we don't push a flow_add to the switch).
Also, in this case the flows reported by the 'flows' CLI command may not be accurate. When the CLI command fetches the flows, it throws all flows from the same switch into a set - this will result in one of the flows with the same match being absent from the output.
So, I believe there are two flow rules in the flow store even though you can't see them from the output of the 'flows' command. The reason for this is when debugging the code that pushed flow_stats info into the flow store, I never see it hit the case where it can't find a matching flow (based on FlowId) in the flow store.
(See NewDistributedFlowRuleStore#addOrUpdateFlowRuleInternal - there's an if(stored != null), and I never see this not true).