Status: Closed (View Workflow)
Affects Version/s: 1.2.2
Fix Version/s: 1.2.2
3 VMs with IPS all beginning with 198.162.2.
All VMs have the following OS and allocations.
Ubuntu 14.04 LTS
17364 MB RAM (16.6 GB)
104.5 GB Disk
All VMs are behind a proxy, but have ssh authentication set up for each other. Proxy is set up so access through it is possible.
When setting up a 3 node cluster with the VMs whose specifications are given above, I have a few problems.
ONOS is built with Maven from the source code which was pulled from the GitHub then checked out for 1.2.2. I am able to build it successfully with all tests enabled; I customized one of the tests which was expecting a certain time display value to expect the correct format for my system, but aside from that, all tests passed without modification.
Once the build finished, I set up the necessary folders and files to use the onos-form-cluster tool which is not included by default in that distribution. Once set up, I am able to successfully form a three node cluster where each node is aware of the others by using the script. If I attempt to connect mininet to this cluster, then I find that no devices are connected in the web UI. I have used this same script with a different controller and was able to successfully connect, so there is no issue with the script. (I adjusted the IPs with the other VMs, but they were VMs still with the above characteristics, but an older version of Ubuntu).
Furthermore, if I bring down any one of the three nodes and restart it with the "ok" alias, the node does not automatically rejoin the cluster but instead starts its own local cluster and takes control over the network on its own. It successfully connects to the scripted mininet topology, the topology becomes visible in the Web UI, and pings begin to succeed. This behavior definitely eliminates the script as the source of the problem. Checking the "nodes" output shows the data for the previous cluster, but with all "state" fields as "null" and all "updated" fields as "Never", with a fourth field above the previous three showing that the local node at 127.0.0.1 is active.
If I just use "onos" to avoid setting up the local cluster configuration,the command displays "Logging in as karaf" and hangs for a couple of minutes before returning to the terminal. I can't get the terminal back and the other nodes still report the one that was terminated as "INACTIVE" on the "nodes" command output under "state".
If I try to follow the steps on deploying ONOS and setting up a cluster from the wiki page, with the video walkthrough at https://www.youtube.com/watch?v=hk1cPmp46n8 , I am capable of following the instructions up until he runs the "onos" command. When he does it, the terminal appears in short order and commands can be issued as normal. However, when I run it, the same behavior as with the built version occurs; the command prints "Logging in as karaf" and then hangs for a couple of minutes before returning to the terminal. This repeated behavior lessens the chance that it was something wrong in my build process that triggers the issue, since it happens in both the locally maven-built and officially pre-built packaged versions. It might be an issue with dealing with proxies when clustering is engaged?
I have been performing a lot of controller research and testing, and this is the last piece of my research that I cannot perform due to the bugs. If it's a problem with 1.2.2 and the answer is that I need to upgrade to 1.5.0, then an official recommendation and acknowledgment of this being a bug would be sufficient justification for the move for the sake of completing the research. Otherwise, I would really appreciate either guidance on how to solve this or any pointers if I am doing something incorrectly.
I appreciate your efforts and look forward to a response.