Uploaded image for project: 'ONOS'
  1. ONOS
  2. ONOS-6396

Separate per-partition per-primitive Copycat sessions

    XMLWordPrintable

    Details

    • Story Points:
      5
    • Epic Link:
    • Sprint:
      K Sprint #3 - Platform

      Description

      After having looked at the logs for a diverse set of HA issues over the past few weeks, I've come to realize there's a fundamental flaw in the Copycat client that exacerbates these types of issues. Often, when we see StorageService failures like timeouts, they seem to cascade across the entire ONOS process. When a ConsistentMap.put call in one application fails, we often see seemingly random timeouts elsewhere in the cluster. The reason for this is because all primitives share a single Copycat session for each partition. This is problematic because the Copycat session performs sequencing for all primitives that interact with a given partition. So, a failure in one primitive can cascade to other primitives.

      Up until now, all primitives shared a Copycat client because it provided ordering guarantees across all primitives. But because we relaxed the primitive thread model in ONOS-6267, those guarantees are now no longer relevant across primitives, but only within primitives. So, the Copycat client should be refactored to support separate logical sessions for each partition of each primitive. This should be a fairly straightforward task to accomplish. Doing so will ensure that sequencing for one primitive occurs independently of sequencing for all other primitives, and it will therefore reduce the likelihood of cascading timeouts and significantly increase the concurrency all the way down to Netty.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

              People

              Assignee:
              kuujo Jordan Halterman
              Reporter:
              kuujo Jordan Halterman
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: