Date
1 - 2 of 2
[controller-dev] Model Driven SAL References
Kyle Forster
Hi Ed - Thanks much for this. A question that came up that I wanted to reflect over on list -- which component of the model driven SAL is intended to do reconciliation in the case where you have multiple running instances of a <provider,broker,application> that temporarily get out of sync? I'm assuming the gist is to replay notifications, but is the design intent to do reconciliation of out-of-order notifications at the Application layer, or is the intent to handle this further down? (Feel free to say "errr - call Colin and Jan for a walk-through of the basics" if I'm totally off track, or if this should be handled through some mechanism other than the SAL.)
(For context - several of the net-virt-platform components where we're scoping the porting effort follow a design pattern that makes specific assumptions about the way that the storage service reconciles under load and split-brain scenarios. Think of physical placement of a controller per pod hanging off a DC spine, where each controller is master for one pod and the slave for another.)
-K On Thu, May 16, 2013 at 12:12 PM, Ed Warnicke (eaw) <eaw@...> wrote:
--------------------------------------------------------- R. Kyle Forster +1 (415) 990-2670 (m) kyle.forster@... |
|
Kyle Forster
Following up on Rob's presentation yesterday, let me give a bit more context for the question so we can get all unblocked. Across about six generations of open source controllers (NOX, SNAC, Beacon, Ryu, Floodlight, net-virt-platform) and good info about the closed versions you can find evolving variants of Host/DeviceManager, TopologyManager, Forwarding and Storage implementations. Target scale, topology diversity and device diversity (vSwitch vs pSwitch due to perf characteristics) lead to a lot of the differences, as do your assumptions about rainy day scenarios (e.g. a rack of 1k VMs all coming online at once and ARPing, a switch in the middle of a fat tree going down, losing a control link in between controller instances, etc.).
Splitting these out turns out to be quite difficult. I believe that Colin started to see this in https://wiki.opendaylight.org/view/D-E_Proposal:Host_Tracker_Plan -- #5 is hard, and as Rob mentioned yesterday, stubbing out #6 breaks the (popular) overlay topologies as every host is reachable via both an OpenFlow tunnel path and a cross-Island path.
So... where does this leave us? We're trying to figure out either: a) Port all of these over in one big whallop, trying to maintain the target scale, topologies, devices and rainy day characteristics of of net-virt-platform over to controller. We've been looking at this for the last two weeks, but have been skinning on our knee on the different directions the two projects are headed on the storage assumptions (think CAP theorem trade-offs). Keep in mind that a practical deployment (at least in our world) is not a single controller, but rather a cluster of controllers for both scale and resiliency, and the cost of having controllers that disagree can (often) result in a forwarding loop.
b) Port Host/DeviceManager, TopologyManager, Forwarding over bit by bit on top of the SAL's storage direction, and do our best to rebuild the target scale, topologies, devices and rainy day characteristics over time. I think we're coming around to this being the vastly more practical approach, though it is a longer road to get back to the targets in the net-virt-platform, with the amount of time that will take pretty closely tied to the differences in deployment / storage assumptions from the original net-virt-platform to the controller design.
It is actually in the spirit of (b) that I was asking the question about event re-ordering. (This is basically a where-do-the-CAP-trade-offs-really-get-made in the SAL question.) Thanks! -K On Sun, May 19, 2013 at 11:30 PM, Kyle Forster <kyle.forster@...> wrote:
--------------------------------------------------------- R. Kyle Forster +1 (415) 990-2670 (m) kyle.forster@... |
|