ONOS cluster (3-node) throws a lot of errors and exceptions after running CHO test for hours.
We also observe hundreds of warning/error messages in one second. And when this happened we also saw a burst of memory usage (~4G Old space allocated within 1 min)
These are the most frequent error messages we saw:
2018-03-26 12:26:43,351 | ERROR | f-event-stats-21 | OpenFlowControllerImpl | 165 | Uncaught exception on onos-of-event-stats-21 org.onosproject.store.service.DocumentException$Timeout: onos-flow-table at org.onosproject.store.primitives.DefaultDocumentTree.complete(DefaultDocumentTree.java:120)[125:org.onosproject.onos-api:1.11.2.SNAPSHOT] at org.onosproject.store.primitives.DefaultDocumentTree.getChildren(DefaultDocumentTree.java:60)[125:org.onosproject.onos-api:1.11.2.SNAPSHOT] at org.onosproject.store.flow.impl.DistributedFlowRuleStore.getFlowEntries(DistributedFlowRuleStore.java:369)[129:org.onosproject.onos-core-dist:1.11.2.SNAPSHOT] at org.onosproject.store.flow.impl.DistributedFlowRuleStore.getFlowEntries(DistributedFlowRuleStore.java:361)[129:org.onosproject.onos-core-dist:1.11.2.SNAPSHOT] at org.onosproject.net.flow.impl.FlowRuleManager$InternalFlowRuleProviderService.pushFlowMetricsInternal(FlowRuleManager.java:530)[127:org.onosproject.onos-core-net:1.11.2.SNAPSHOT] at org.onosproject.net.flow.impl.FlowRuleManager$InternalFlowRuleProviderService.pushFlowMetrics(FlowRuleManager.java:519)[127:org.onosproject.onos-core-net:1.11.2.SNAPSHOT] at org.onosproject.provider.of.flow.impl.OpenFlowRuleProvider$InternalFlowProvider.pushFlowMetrics(OpenFlowRuleProvider.java:642)[168:org.onosproject.onos-providers-openflow-flow:1.11.2.SNAPSHOT] at org.onosproject.provider.of.flow.impl.OpenFlowRuleProvider$InternalFlowProvider.handleMessage(OpenFlowRuleProvider.java:441)[168:org.onosproject.onos-providers-openflow-flow:1.11.2.SNAPSHOT] at org.onosproject.openflow.controller.impl.OpenFlowControllerImpl$OFMessageHandler.run(OpenFlowControllerImpl.java:800)[165:org.onosproject.onos-protocols-openflow-ctl:1.11.2.SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_72] at java.lang.Thread.run(Thread.java:745)[:1.8.0_72]
2018-03-26 21:48:04,790 | ERROR | ent-current-fg-1 | EventuallyConsistentMapImpl | 130 | Uncaught exception on onos-ecm-intent-current-fg-1 org.onosproject.store.service.StorageException$Timeout at org.onosproject.store.primitives.DefaultLeaderElector.complete(DefaultLeaderElector.java:115) at org.onosproject.store.primitives.DefaultLeaderElector.getLeadership(DefaultLeaderElector.java:75) at org.onosproject.store.cluster.impl.DistributedLeadershipStore.getLeadership(DistributedLeadershipStore.java:166) at org.onosproject.cluster.impl.LeadershipManager.getLeadership(LeadershipManager.java:83) at org.onosproject.store.intent.impl.WorkPartitionManager.getLeader(WorkPartitionManager.java:135) at org.onosproject.store.intent.impl.WorkPartitionManager.isMine(WorkPartitionManager.java:128) at org.onosproject.store.intent.impl.GossipIntentStore.isMaster(GossipIntentStore.java:428) at org.onosproject.store.intent.impl.GossipIntentStore$InternalCurrentListener.event(GossipIntentStore.java:467) at org.onosproject.store.primitives.impl.EventuallyConsistentMapImpl.lambda$notifyListeners$11(EventuallyConsistentMapImpl.java:565) at java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:890)[:1.8.0_72] at java.util.concurrent.CopyOnWriteArraySet.forEach(CopyOnWriteArraySet.java:404)[:1.8.0_72] at org.onosproject.store.primitives.impl.EventuallyConsistentMapImpl.notifyListeners(EventuallyConsistentMapImpl.java:565) at org.onosproject.store.primitives.impl.EventuallyConsistentMapImpl.lambda$processUpdates$25(EventuallyConsistentMapImpl.java:756) at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:397) at org.onosproject.store.primitives.impl.EventuallyConsistentMapImpl.processUpdates(EventuallyConsistentMapImpl.java:747) at org.onosproject.store.cluster.messaging.impl.ClusterCommunicationManager$InternalMessageConsumer.accept(ClusterCommunicationManager.java:338)[129:org.onosproject.onos-core-dist:1.11.2.SNAPSHOT] at org.onosproject.store.cluster.messaging.impl.ClusterCommunicationManager$InternalMessageConsumer.accept(ClusterCommunicationManager.java:327)[129:org.onosproject.onos-core-dist:1.11.2.SNAPSHOT] at org.onosproject.store.cluster.messaging.impl.NettyMessagingManager.lambda$null$16(NettyMessagingManager.java:456) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_72] at java.lang.Thread.run(Thread.java:745)[:1.8.0_72]
2018-03-26 21:57:07,349 | ERROR | -event-stats-300 | OpenFlowControllerImpl | 165 | Uncaught exception on onos-of-event-stats-300 org.onosproject.store.service.ConsistentMapException$Timeout: onos-meter-store at org.onosproject.store.primitives.DefaultConsistentMap.complete(DefaultConsistentMap.java:233)[125:org.onosproject.onos-api:1.11.2.SNAPSHOT] at org.onosproject.store.primitives.DefaultConsistentMap.values(DefaultConsistentMap.java:148)[125:org.onosproject.onos-api:1.11.2.SNAPSHOT] at org.onosproject.store.primitives.ConsistentMapBackedJavaMap.values(ConsistentMapBackedJavaMap.java:141)[125:org.onosproject.onos-api:1.11.2.SNAPSHOT] at org.onosproject.incubator.store.meter.impl.DistributedMeterStore.getAllMeters(DistributedMeterStore.java:287)[134:org.onosproject.onos-incubator-store:1.11.2.SNAPSHOT] at org.onosproject.incubator.net.meter.impl.MeterManager$InternalMeterProviderService.pushMeterMetrics(MeterManager.java:251)[133:org.onosproject.onos-incubator-net:1.11.2.SNAPSHOT] at org.onosproject.provider.of.meter.impl.OpenFlowMeterProvider.pushMeterStats(OpenFlowMeterProvider.java:293)[170:org.onosproject.onos-providers-openflow-meter:1.11.2.SNAPSHOT] at org.onosproject.provider.of.meter.impl.OpenFlowMeterProvider.access$100(OpenFlowMeterProvider.java:91)[170:org.onosproject.onos-providers-openflow-meter:1.11.2.SNAPSHOT] at org.onosproject.provider.of.meter.impl.OpenFlowMeterProvider$InternalMeterListener.handleMessage(OpenFlowMeterProvider.java:422)[170:org.onosproject.onos-providers-openflow-meter:1.11.2.SNAPSHOT] at org.onosproject.openflow.controller.impl.OpenFlowControllerImpl$OFMessageHandler.run(OpenFlowControllerImpl.java:800)[165:org.onosproject.onos-protocols-openflow-ctl:1.11.2.SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_72] at java.lang.Thread.run(Thread.java:745)[:1.8.0_72]
See more details in the onos-diag attached