Should network-ODL be rewritten as a monolithic plugin or continue as an ML2 driver


Ryan Moats
 

I'm going to use this to reply to the question below as well as update this thread from last weeks meeting.

Some points that this thread needs to consider from last week's meeting...

(1) the current design blocks neutron server until ODL responds - because of the current use of I*Aware, ODL can take a while to respond. I'm not sure that it remains an issue with the new MD-SAL design post Boron.

(2) Some of the problems we are encountering are already known to the ML-2 OpenStack team and so we should be aware of what solutions they are coming up with to the probblems. Alternatively, if we go with the monolithic plugin, we take on the responsibilities of solving these problems, but, we can solve them in the way that works best for ODL.

Isaku Yamahata <yamahata@...> wrote on 08/17/2015 02:20:58 PM:

> From: Isaku Yamahata <yamahata@...>

> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: Isaku Yamahata <isaku.yamahata@...>,
> yamahata@..., neutron-dev@...

> Date: 08/18/2015 02:38 PM
> Subject: Re: [neutron-dev] Should network-ODL be rewritten as a
> monolithic plugin or continue as an ML2 driver

>
> On Sat, Aug 15, 2015 at 08:29:32PM -0500,
> Ryan Moats <rmoats@...> wrote:
>
> > I will comment in line as far as I can.  Others may have different
> > opinions...
>
> Yeah, I'd like to see other's opinions too...
> And also expected use cases.
>
>
> > > Now long details follows.
> > >
> > > * ML2 vs monolithic
> > > - migration from other technology
> > >   ML2 is better for migration from other ML2 driver.
> > > - development cost
> > >   monolithic requires much engineering cost.
> > >   Ryan seems to think ML2 design imposes too much restrictions.
> > >   Can you please elaborate?
> >
> > Since ODL wants to know about L3 items and other extensions, an ML2 driver
> > is not going to be
> > the end of the code - there will also need to be code that allows L3 and
> > extension information
> > to be passed to ODL.  IIRC (and I'm willing to be corrected), L3 is still a
> > monolith setup, so
> > at best I see networking-ODL as being a hybrid: a monolithic L3/extension
> > code base on top an ML2 driver.
>
> You're right and I agree with you. Right now ML3 is under debate yet.
>
>
> > However, the key question in my mind is does networking-ODL have to coexist
> > with other drivers.
> > If the answer is yes, then ML2 is the way to go.  If the answer is no, then
> > we can entertain
> > a monolithic plugin.
>
> Agree. User requirements are needed.
>
>
>
> > > - reasonable performance: requests/second
> > >
> > > * observed problems/solutions.
> > > I think the issues are independent from 'ML2 vs monolithic'
> > > and ML2 team is already aware of those issues.
> > > Those issues needs to be solved anyway.
> >
> > One point to remember is that the ML2 team may not solve them in the way
> > that an SDN controller
> > would need the problems to be solved.  A monolithic plugin allows them to
> > be solved precisely
> > the way ODL wants them to be solved.
>
> Maybe or maybe not. So let's figure out what's the issues and how we'd
> like to solve them first.
>
>
> > >   C. Accept out-of-sync of two state. fix it up by background
> > synchronization
> > >      Nuage plugin takes this approach.
> > >      Run background thread to monitor the de-synchronization and fix it
> > >      if found.
> > >      Potentially we can introduce sequential number to track the diff
> > >      between neutron and odl.
> > >      Also by sequential number, ODL can detect reordered requests
> > > from neutron.
> > >   D. any other idea?
> >
> > Part of the sync problem is that the communication between OS and ODL today
> > is only one phase (post-commit events).  I believe we should look at using
> > both the pre-commit and post-commit phases so that in the
> > pre-commit phase we can veto a change.  Once we reach the post-commit
> > phase, then it's up to the ML2 agent
> > to ensure that the change gets made at ODL.
>
> Interesting approach. Changing wire protocol should not be excluded.
> Can you please elaborate on interaction between neutron and ODL?
> What are precommit/postcommit expected to do on neutron side/odl side?


Like I said above - today ODL is only told about the event post-commit.  Now, we *should* be able to veto the change and have OS roll it back, but that's not as clean as subscribing to the pre-commit event and using that to check sanity and then accepting the change
on the post-commit event.  Note: this is only the case if we decide that we *have* to check
sanity in some cases and while I started in that camp when I did the original Hydrogen commits, I'm no longer convinced that is a good idea (in fact, I'm now leaning towards it being a mistake)

> > > - neutron HA support
> > >   Currently full synchronization can be done by any neutron servers.
> > >   It's disastrous. When a neutron server is processing synchronization,
> > >   other neutron server should be prohibited.
> > >   Ideally each neutron servers take ownership for a subset of
> > >   network/subnsets/ports... for scalability.
> >
> > I'm going to be a little thick here and say I don't understand why this is
> > an issue for networking-ODL
> > and not for neutron in general....
>
> Right, this is common problem among controller based ML2 drivers
> (or monolithic plugins). On the other, hand agent based drivers don't
> have this issue. So it's partly general neutron problem, partly ML2 driver
> (or plugin) dependent problem.
> Anyway it's worthwhile to discuss on this in order to figure out what's
> needed for ODL.
> --
> Isaku Yamahata <isaku.yamahata@...>
>


Isaku Yamahata <yamahata@...>
 

On Sat, Aug 15, 2015 at 08:29:32PM -0500,
Ryan Moats <rmoats@...> wrote:

I will comment in line as far as I can. Others may have different
opinions...
Yeah, I'd like to see other's opinions too...
And also expected use cases.


Now long details follows.

* ML2 vs monolithic
- migration from other technology
ML2 is better for migration from other ML2 driver.
- development cost
monolithic requires much engineering cost.
Ryan seems to think ML2 design imposes too much restrictions.
Can you please elaborate?
Since ODL wants to know about L3 items and other extensions, an ML2 driver
is not going to be
the end of the code - there will also need to be code that allows L3 and
extension information
to be passed to ODL. IIRC (and I'm willing to be corrected), L3 is still a
monolith setup, so
at best I see networking-ODL as being a hybrid: a monolithic L3/extension
code base on top an ML2 driver.
You're right and I agree with you. Right now ML3 is under debate yet.


However, the key question in my mind is does networking-ODL have to coexist
with other drivers.
If the answer is yes, then ML2 is the way to go. If the answer is no, then
we can entertain
a monolithic plugin.
Agree. User requirements are needed.



- reasonable performance: requests/second

* observed problems/solutions.
I think the issues are independent from 'ML2 vs monolithic'
and ML2 team is already aware of those issues.
Those issues needs to be solved anyway.
One point to remember is that the ML2 team may not solve them in the way
that an SDN controller
would need the problems to be solved. A monolithic plugin allows them to
be solved precisely
the way ODL wants them to be solved.
Maybe or maybe not. So let's figure out what's the issues and how we'd
like to solve them first.


C. Accept out-of-sync of two state. fix it up by background
synchronization
Nuage plugin takes this approach.
Run background thread to monitor the de-synchronization and fix it
if found.
Potentially we can introduce sequential number to track the diff
between neutron and odl.
Also by sequential number, ODL can detect reordered requests
from neutron.
D. any other idea?
Part of the sync problem is that the communication between OS and ODL today
is only one phase (post-commit events). I believe we should look at using
both the pre-commit and post-commit phases so that in the
pre-commit phase we can veto a change. Once we reach the post-commit
phase, then it's up to the ML2 agent
to ensure that the change gets made at ODL.
Interesting approach. Changing wire protocol should not be excluded.
Can you please elaborate on interaction between neutron and ODL?
What are precommit/postcommit expected to do on neutron side/odl side?


- neutron HA support
Currently full synchronization can be done by any neutron servers.
It's disastrous. When a neutron server is processing synchronization,
other neutron server should be prohibited.
Ideally each neutron servers take ownership for a subset of
network/subnsets/ports... for scalability.
I'm going to be a little thick here and say I don't understand why this is
an issue for networking-ODL
and not for neutron in general....
Right, this is common problem among controller based ML2 drivers
(or monolithic plugins). On the other, hand agent based drivers don't
have this issue. So it's partly general neutron problem, partly ML2 driver
(or plugin) dependent problem.
Anyway it's worthwhile to discuss on this in order to figure out what's
needed for ODL.
--
Isaku Yamahata <isaku.yamahata@...>


Ryan Moats
 

I will comment in line as far as I can. Others may have different opinions...

Ryan

Isaku Yamahata <isaku.yamahata@...> wrote on 08/15/2015 07:37:29 PM:

> From: Isaku Yamahata <isaku.yamahata@...>

> To: Ryan Moats/Omaha/IBM@IBMUS
> Cc: neutron-dev@...,
> yamahata@..., isaku.yamahata@...

> Date: 08/15/2015 07:37 PM
> Subject: Re: [neutron-dev] Should network-ODL be rewritten as a
> monolithic plugin or continue as an ML2 driver

>
> Hi Ryan. Thank you for starting this thread.
>
> Ryan, can you please elaborate your motivation to go for monolithic
> plugin? why monolithic plugin makes problems easier than ML2 driver?
> At least the following should be discussed, I think.
>
> * pros/cons for ml2 vs monolithic
> * requirement and problem to be solved.
>   how different the two approaches affects solutions.
>   Without this, we can't understand which way is better.
>
>
> Now long details follows.
>
> * ML2 vs monolithic
> - migration from other technology
>   ML2 is better for migration from other ML2 driver.
> - development cost
>   monolithic requires much engineering cost.
>   Ryan seems to think ML2 design imposes too much restrictions.
>   Can you please elaborate?


Since ODL wants to know about L3 items and other extensions, an ML2 driver is not going to be
the end of the code - there will also need to be code that allows L3 and extension information
to be passed to ODL.  IIRC (and I'm willing to be corrected), L3 is still a monolith setup, so
at best I see networking-ODL as being a hybrid: a monolithic L3/extension code base on top an ML2 driver.

However, the key question in my mind is does networking-ODL have to coexist with other drivers.
If the answer is yes, then ML2 is the way to go.  If the answer is no, then we can entertain
a monolithic plugin.

> * requirements:
> - neutron server ha

> - scalability
>   large number of neutron networks/subnets/ports... (e.g. 10k ports)
>   should be able to be handled.


I don't understand why scalability makes a difference in this discussion, but if it does,
you need to up that number by an order of magnitude or two.

> - reasonable performance: requests/second
>
> * observed problems/solutions.
> I think the issues are independent from 'ML2 vs monolithic'
> and ML2 team is already aware of those issues.
> Those issues needs to be solved anyway.


One point to remember is that the ML2 team may not solve them in the way that an SDN controller
would need the problems to be solved.  A monolithic plugin allows them to be solved precisely
the way ODL wants them to be solved.

> - requests from neutron to odl are serialized
>   The communication can be made asynchronous.
>   There is PoC code for ML2.


There is a difference in my mind between serialization of requests and asynchronous communication, so
let's be careful about commingling these two items.

> - race condition among request from neutron to odl
>   This stems from maintaining two states in neutron and odl.
>   There are several approaches.
>   A. Make neutron stateless
>      i.e. Not use neutron db.
>      All the user requests to neutron are passed through to ODL.
>      This implies most functionality needs to be re-implemented in ODL.
>      OpenContrail plugin takes this approach


This implies that NN replicates all of the db/checking functionality already in Neutron, which makes
very little sense to me, so I'm not interested in this at all.

>   B. Make ODL neutron northbound stateless
>      This is reverse of A.
>      When those information are needed in ODL, ODL asks Neturon for it.
>      This is not feasible because MD-SAL is core concept of ODL


This is a bit of a misnomer as in my mind NN itself *is* already stateless.  I would characterize this
solution differently as "don't store neutron information in ODL and ask for it when needed".  This
(IMHO) doesn't scale.

>   C. Accept out-of-sync of two state. fix it up by background synchronization
>      Nuage plugin takes this approach.
>      Run background thread to monitor the de-synchronization and fix it
>      if found.
>      Potentially we can introduce sequential number to track the diff
>      between neutron and odl.
>      Also by sequential number, ODL can detect reordered requests
> from neutron.
>   D. any other idea?


Part of the sync problem is that the communication between OS and ODL today is only one phase (post-commit events).  I believe we should look at using both the pre-commit and post-commit phases so that in the
pre-commit phase we can veto a change.  Once we reach the post-commit phase, then it's up to the ML2 agent
to ensure that the change gets made at ODL.

> - neutron HA support
>   Currently full synchronization can be done by any neutron servers.
>   It's disastrous. When a neutron server is processing synchronization,
>   other neutron server should be prohibited.
>   Ideally each neutron servers take ownership for a subset of
>   network/subnsets/ports... for scalability.


I'm going to be a little thick here and say I don't understand why this is an issue for networking-ODL
and not for neutron in general....

>
> thanks,
>
> On Thu, Aug 13, 2015 at 04:59:10PM -0500,
> Ryan Moats <rmoats@...> wrote:
>
> >
> > Folks, I took the action item to kick of threads from the extensive
> > discussion we had last week [1].  I've been behind the 8 ball, but am
> > remembering this belatedly...
> >
> > This thread is to discuss whether networking-ODL should continue as an ML2
> > driver or a monolithic plugin...
> >
> > Let the comments begin,
> > Ryan
>
> > _______________________________________________
> > neutron-dev mailing list
> > neutron-dev@...
> > https://lists.opendaylight.org/mailman/listinfo/neutron-dev
>
>
> --
> Isaku Yamahata <isaku.yamahata@...>
>


Isaku Yamahata
 

Hi Ryan. Thank you for starting this thread.

Ryan, can you please elaborate your motivation to go for monolithic
plugin? why monolithic plugin makes problems easier than ML2 driver?
At least the following should be discussed, I think.

* pros/cons for ml2 vs monolithic
* requirement and problem to be solved.
how different the two approaches affects solutions.
Without this, we can't understand which way is better.


Now long details follows.

* ML2 vs monolithic
- migration from other technology
ML2 is better for migration from other ML2 driver.
- development cost
monolithic requires much engineering cost.
Ryan seems to think ML2 design imposes too much restrictions.
Can you please elaborate?

* requirements:
- neutron server ha
- scalability
large number of neutron networks/subnets/ports... (e.g. 10k ports)
should be able to be handled.
- reasonable performance: requests/second

* observed problems/solutions.
I think the issues are independent from 'ML2 vs monolithic'
and ML2 team is already aware of those issues.
Those issues needs to be solved anyway.

- requests from neutron to odl are serialized
The communication can be made asynchronous.
There is PoC code for ML2.

- race condition among request from neutron to odl
This stems from maintaining two states in neutron and odl.
There are several approaches.
A. Make neutron stateless
i.e. Not use neutron db.
All the user requests to neutron are passed through to ODL.
This implies most functionality needs to be re-implemented in ODL.
OpenContrail plugin takes this approach
B. Make ODL neutron northbound stateless
This is reverse of A.
When those information are needed in ODL, ODL asks Neturon for it.
This is not feasible because MD-SAL is core concept of ODL
C. Accept out-of-sync of two state. fix it up by background synchronization
Nuage plugin takes this approach.
Run background thread to monitor the de-synchronization and fix it
if found.
Potentially we can introduce sequential number to track the diff
between neutron and odl.
Also by sequential number, ODL can detect reordered requests from neutron.
D. any other idea?

- neutron HA support
Currently full synchronization can be done by any neutron servers.
It's disastrous. When a neutron server is processing synchronization,
other neutron server should be prohibited.
Ideally each neutron servers take ownership for a subset of
network/subnsets/ports... for scalability.

thanks,

On Thu, Aug 13, 2015 at 04:59:10PM -0500,
Ryan Moats <rmoats@...> wrote:


Folks, I took the action item to kick of threads from the extensive
discussion we had last week [1]. I've been behind the 8 ball, but am
remembering this belatedly...

This thread is to discuss whether networking-ODL should continue as an ML2
driver or a monolithic plugin...

Let the comments begin,
Ryan
_______________________________________________
neutron-dev mailing list
neutron-dev@...
https://lists.opendaylight.org/mailman/listinfo/neutron-dev

--
Isaku Yamahata <isaku.yamahata@...>


Ryan Moats
 

Folks, I took the action item to kick of threads from the extensive discussion we had last week [1]. I've been behind the 8 ball, but am remembering this belatedly...

This thread is to discuss whether networking-ODL should continue as an ML2 driver or a monolithic plugin...

Let the comments begin,
Ryan