ODL modernization


Luis Gomez
 

Hi TSC ex-colleagues,

As I promised, here is a list of topics for ODL modernization. This is all that came to my mind now but it could be more, the goal here is really to trigger some brainstorm and discussion.

1) Use cases:

Our YANG based platform made it easy for ODL to become the SDN controller of choice for HW devices supporting NETCONF and other control protocols like BGP-LS, PCEP, OPENFLOW where multi-vendor is important. However it made it hard to have a role in the cloud where YANG and protocols like NETCONF, BGP-LS, PCEP are almost not existent and multi-vendor is not that important. Now, given the amount of money and effort that is currently put in the cloud, it would make sense for ODL to at least participate in some cloud use case. For example, I believe ODL can still play some role in the hybrid cloud use case where HW devices need to talk to cloud devices, ODL could take care of the HW devices while another open source controller could take care of the cloud devices.

2) SW Platform:

The OSGI/Karaf platform was state-of-the-art 10 years back where there was no real micro-services platforms like K8s, but now it is just obsolete. In the new paradigm of micro-services, applications are loose-coupled and they are mostly self-contained (e.g. run their own processes within a container) although they can share some common resources like a database, a message broker, an API gateway, etc.

For ODL to fit in the new paradigm, we would need to:

- Replace ODL distribution with ODL applications that can run in their own container: NETCONF, BGP, PCEP, TPCE, etc
- Consolidate kernel repos and jars: Existing ODL applications share a common code called kernel. Today we have ~6 repositories and a bunch of jars for the kernel where there is a single person/organization maintaining the code.
- Replace Karaf/OSGI framework with something more actual to plumb the java code and jars together (e.g. spring).

3) Infrastructure:

Here is kind of obsolete too. Some ideas to renew the build and test infrastructure:

Build pipeline:
- Move from JJB (not maintained anymore) to Jenkins pipelines or similar (work is ongoing).
- Move to a continuous release process where a merge in master produces a new release in the artifact repository (staging). This simplifies a lot the release process: just move artifacts from staging to release repository.
- Every ODL application (NETCONF, BGP, PCEP, TPCE, etc) should generate a container automatically after a merge in master. We should be testing this vs ODL distribution.

System test:
- Robot was the best open source system test framework 10 years back where organizations had developers and system engineers separated. Things have changed a lot since and many agile organizations nowadays have developers doing system integration and system test code apart from writing product features. Robot framework is good for system integrators with basic or non coding skills but bad for developers that have to ramp up in a new language that soon find very limiting. This is why I think at this moment it would be good to switch to something like pytest for example.
- Leverage K8s to do multi-application and scale testing.

BR/Luis


Guillaume Lambert
 

Hello Luis


Thanks for this feedback. I mostly share your concerns here at least about 2 and 3.

(about 1, I am not very involved in cloud use-cases and the situation is a bit different in core networks)


about 3)

We had some experience in tpce to use a combination of python tox + unittest + requests to develop blackbox testing suites for our REST APIs, mostly thanks to Cédric's guidance.
It might make sense for the infra tests too IMO.


My 2 cents


BR

Guillaume




De : TSC@... <TSC@...> de la part de Luis Gomez <ecelgp@...>
Envoyé : samedi 9 avril 2022 05:47
À : tsc
Objet : [OpenDaylight TSC] ODL modernization
 
Hi TSC ex-colleagues,

As I promised, here is a list of topics for ODL modernization. This is all that came to my mind now but it could be more, the goal here is really to trigger some brainstorm and discussion.

1) Use cases:

Our YANG based platform made it easy for ODL to become the SDN controller of choice for HW devices supporting NETCONF and other control protocols like BGP-LS, PCEP, OPENFLOW where multi-vendor is important. However it made it hard to have a role in the cloud where YANG and protocols like NETCONF, BGP-LS, PCEP are almost not existent and multi-vendor is not that important. Now, given the amount of money and effort that is currently put in the cloud, it would make sense for ODL to at least participate in some cloud use case. For example, I believe ODL can still play some role in the hybrid cloud use case where HW devices need to talk to cloud devices, ODL could take care of the HW devices while another open source controller could take care of the cloud devices.

2) SW Platform:

The OSGI/Karaf platform was state-of-the-art 10 years back where there was no real micro-services platforms like K8s, but now it is just obsolete. In the new paradigm of micro-services, applications are loose-coupled and they are mostly self-contained (e.g. run their own processes within a container) although they can share some common resources like a database, a message broker, an API gateway, etc.

For ODL to fit in the new paradigm, we would need to:

- Replace ODL distribution with ODL applications that can run in their own container: NETCONF, BGP, PCEP, TPCE, etc
- Consolidate kernel repos and jars: Existing ODL applications share a common code called kernel. Today we have ~6 repositories and a bunch of jars for the kernel where there is a single person/organization maintaining the code.
- Replace Karaf/OSGI framework with something more actual to plumb the java code and jars together (e.g. spring).

3) Infrastructure:

Here is kind of obsolete too. Some ideas to renew the build and test infrastructure:

Build pipeline:
- Move from JJB (not maintained anymore) to Jenkins pipelines or similar (work is ongoing).
- Move to a continuous release process where a merge in master produces a new release in the artifact repository (staging). This simplifies a lot the release process: just move artifacts from staging to release repository.
- Every ODL application (NETCONF, BGP, PCEP, TPCE, etc) should generate a container automatically after a merge in master. We should be testing this vs ODL distribution.

System test:
-  Robot was the best open source system test framework 10 years back where organizations had developers and system engineers separated. Things have changed a lot since and many agile organizations nowadays have developers doing system integration and system test code apart from writing product features. Robot framework is good for system integrators with basic or non coding skills but bad for developers that have to ramp up in a new language that soon find very limiting. This is why I think at this moment it would be good to switch to something like pytest for example.
- Leverage K8s to do multi-application and scale testing.

BR/Luis




_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Robert Varga
 

On 09/04/2022 05:47, Luis Gomez wrote:
Hi TSC ex-colleagues,
As I promised, here is a list of topics for ODL modernization. This is all that came to my mind now but it could be more, the goal here is really to trigger some brainstorm and discussion.
1) Use cases:
Our YANG based platform made it easy for ODL to become the SDN controller of choice for HW devices supporting NETCONF and other control protocols like BGP-LS, PCEP, OPENFLOW where multi-vendor is important. However it made it hard to have a role in the cloud where YANG and protocols like NETCONF, BGP-LS, PCEP are almost not existent and multi-vendor is not that important. Now, given the amount of money and effort that is currently put in the cloud, it would make sense for ODL to at least participate in some cloud use case. For example, I believe ODL can still play some role in the hybrid cloud use case where HW devices need to talk to cloud devices, ODL could take care of the HW devices while another open source controller could take care of the cloud devices.
Strictly speaking: we already do, but through external integrations and those bring very little in terms of engagement/contributions :(

2) SW Platform:
The OSGI/Karaf platform was state-of-the-art 10 years back where there was no real micro-services platforms like K8s, but now it is just obsolete. In the new paradigm of micro-services, applications are loose-coupled and they are mostly self-contained (e.g. run their own processes within a container) although they can share some common resources like a database, a message broker, an API gateway, etc.
Weeeeeeeeeell. I could *very easily* sell you OSGi as:
- the original Java micro-service platform
- the current Java nano-service platform

Also JPMS re-creates a (miserable) subset of OSGi. But that really is semantics, so let's not get distracted here.

OSGi is just another runtime. It always has been where ODL architectural principles are considered. Unfortunately some projects require it as a core dependency -- just because they did not know better at the time that code was merged. There has been some very solid work done over the past ~2 years to remove those assumptions.

What remains, though, is that OSGi+Karaf is the only thing that is seriously integrated *and tested*. Your next comments and my responses need to be considered with regard to this simple truth.

For ODL to fit in the new paradigm, we would need to:
- Replace ODL distribution with ODL applications that can run in their own container: NETCONF, BGP, PCEP, TPCE, etc
Agree. That is a packaging exercise. From the get go, we can (mostly, with notable exception of PCEP right now) do whatever you'd like... more below.

- Consolidate kernel repos and jars: Existing ODL applications share a common code called kernel. Today we have ~6 repositories and a bunch of jars for the kernel where there is a single person/organization maintaining the code.
Yeah, no. As the person referenced in that sentence, I have to say that the each repo (and project) has a rather well-defined scope -- as per our governance. It also keeps a reasonable straight-jacket on what is done and how.

Merging the repos would throw us back ~9 years when controller.git contained all of yangtools, mdsal, netconf, adsal, the works. It would also lower to guards we have against layering/design violations.

One example I can quote here is https://git.opendaylight.org/gerrit/c/openflowplugin/+/91313 -- that patch should have never been approved and it was upgraded SpotBugs which eventually found it. It took three days to correct -- not something we want to do if long-term maintainability is our goal.

- Replace Karaf/OSGI framework with something more actual to plumb the java code and jars together (e.g. spring).
Right, and odlmicro just dropped the ball here in more than one way :(

The long-term plan here is to have OSGi DS annotations and a reasonable DI framework for static deployments. In this regard Blueprint is one of the worst decisions we have ever made.

In terms of "reasonable framework":

1. we have static Karaf fully supported, but per-use-case packaging has not been appearing. This is easy pickings most of the time: create a static karaf, put into a docker (or whathever) and you are done. It is a no-frills solution, boot time is at ~30% of what dynamic Karaf is.

2. odlmicro is moribund. Contributors are welcome, but if those do not step forward, https://github.com/PANTHEONtech/lighty is very much an alternative, which is not readily integratable into individual projects :(

3. what we *really* want to do is Dagger. That's pretty much the gold standard in Android apps, completely compile-time, all that jazz. While we have the basics ready in some places, noone has prototyped anything reasonable with it.

At this point, I think the idea behind odlmicro should be supplanted by a single goal: make everthing wireable via Dagger and provide a Dagger-based equivalent of netconf.git/static/pom.xml to get that use case up and running.

3) Infrastructure:
Here is kind of obsolete too. Some ideas to renew the build and test infrastructure:
Build pipeline:
- Move from JJB (not maintained anymore) to Jenkins pipelines or similar (work is ongoing).
Yeah, that's the idea, but I am not aware of anyone actively working on this.

- Move to a continuous release process where a merge in master produces a new release in the artifact repository (staging). This simplifies a lot the release process: just move artifacts from staging to release repository.
This ties in with your comment about consolidating repos. Rather that that, I wish we just had a reasonable infra to automatically release on patch merge -- including version bumps, CSIT validation, proper git history (which we do not have), all that jazz.

There is just no way I can over-sell this -- this is the core piece of automation we are missing. Requires some real DevOps folks, which we seem to be in short supply of these days.

- Every ODL application (NETCONF, BGP, PCEP, TPCE, etc) should generate a container automatically after a merge in master. We should be testing this vs ODL distribution.
Right, and we are building towards this. I think NETCONF is *almost* there, but I am not sure. This is a prerequisite to having maven-stage jobs executing CSIT prior to allowing release, which ties in to the previous point.

System test:
- Robot was the best open source system test framework 10 years back where organizations had developers and system engineers separated. Things have changed a lot since and many agile organizations nowadays have developers doing system integration and system test code apart from writing product features. Robot framework is good for system integrators with basic or non coding skills but bad for developers that have to ramp up in a new language that soon find very limiting. This is why I think at this moment it would be good to switch to something like pytest for example.
Yes.

At the end of the day, CSIT must be completely owned by the project it is testing and it must be reasonably maintainable. Neither int/test organization (we still carry SXP tests?!) nor RF fulfill (please point me to a reasonable IDE, can you?) that criteria :)

- Leverage K8s to do multi-application and scale testing.
I think int/packaging some overhaul here. At the end of the day, the first use case we need to have is a Docker (or whatever, I don't care) based on netconf.git/static being packaged as part of netconf-maven-merge job. I have no idea what we need to make that happen, though.

Regards,
Robert


Robert Varga
 

On 15/04/2022 00:20, Robert Varga wrote:

- Leverage K8s to do multi-application and scale testing.
I think int/packaging some overhaul here. At the end of the day, the first use case we need to have is a Docker (or whatever, I don't care) based on netconf.git/static being packaged as part of netconf-maven-merge job. I have no idea what we need to make that happen, though.
Just to qualify this end-to-end in terms of the NETCONF/RESTCONF pass-through case.

netconf-maven-stage-master should produce a container containing the use-case, using whatever runtime (dynamic Karaf, static Karaf, Guice, Dagger, it does not matter.

netconf.git-hosted CSIT should combine that with a Helm chart on the LFIT-K8S-implementation-du-jour. Run netconf.git-hosted CSIT on that. ~80% of int/test infra is not in the picture.

If it passes, netconf-maven-release-merge is good to go and should run automatically (and lock the branch, bump versions, tag the release, all that jazz). No committer intervention necessary.

Once that completes, downstream projects should realize "hey, there is an upgraded upstream, perhaps I need to release with what I have", go through exactly the same process, rinse&repeat until you hit an autorelease project.

And then the question arises: why do we have autorelease and cannot just publish int/dist release based on this pipeline to Maven Central? :)

So ... what is up for getting their hands dirty?

Regards,
Robert


Luis Gomez
 

On Apr 14, 2022, at 3:20 PM, Robert Varga <nite@...> wrote:



On 09/04/2022 05:47, Luis Gomez wrote:
Hi TSC ex-colleagues,
As I promised, here is a list of topics for ODL modernization. This is all that came to my mind now but it could be more, the goal here is really to trigger some brainstorm and discussion.
1) Use cases:
Our YANG based platform made it easy for ODL to become the SDN controller of choice for HW devices supporting NETCONF and other control protocols like BGP-LS, PCEP, OPENFLOW where multi-vendor is important. However it made it hard to have a role in the cloud where YANG and protocols like NETCONF, BGP-LS, PCEP are almost not existent and multi-vendor is not that important. Now, given the amount of money and effort that is currently put in the cloud, it would make sense for ODL to at least participate in some cloud use case. For example, I believe ODL can still play some role in the hybrid cloud use case where HW devices need to talk to cloud devices, ODL could take care of the HW devices while another open source controller could take care of the cloud devices.
Strictly speaking: we already do, but through external integrations and those bring very little in terms of engagement/contributions :(
OK, then maybe it will help some marketing and/or white paper if anyone knows about these use cases in more detail :)


2) SW Platform:
The OSGI/Karaf platform was state-of-the-art 10 years back where there was no real micro-services platforms like K8s, but now it is just obsolete. In the new paradigm of micro-services, applications are loose-coupled and they are mostly self-contained (e.g. run their own processes within a container) although they can share some common resources like a database, a message broker, an API gateway, etc.
Weeeeeeeeeell. I could *very easily* sell you OSGi as:
- the original Java micro-service platform
- the current Java nano-service platform
Fair enough, I did not explained well myself, Java nano-service is still required but nowadays you can balance between java nano-service and container micro-service so with both at hand you can cover very much do anything you want. I think what we miss the most is infrastructure and devops work for container micro-services.


Also JPMS re-creates a (miserable) subset of OSGi. But that really is semantics, so let's not get distracted here.

OSGi is just another runtime. It always has been where ODL architectural principles are considered. Unfortunately some projects require it as a core dependency -- just because they did not know better at the time that code was merged. There has been some very solid work done over the past ~2 years to remove those assumptions.

What remains, though, is that OSGi+Karaf is the only thing that is seriously integrated *and tested*. Your next comments and my responses need to be considered with regard to this simple truth.

For ODL to fit in the new paradigm, we would need to:
- Replace ODL distribution with ODL applications that can run in their own container: NETCONF, BGP, PCEP, TPCE, etc
Agree. That is a packaging exercise. From the get go, we can (mostly, with notable exception of PCEP right now) do whatever you'd like... more below.

- Consolidate kernel repos and jars: Existing ODL applications share a common code called kernel. Today we have ~6 repositories and a bunch of jars for the kernel where there is a single person/organization maintaining the code.
Yeah, no. As the person referenced in that sentence, I have to say that the each repo (and project) has a rather well-defined scope -- as per our governance. It also keeps a reasonable straight-jacket on what is done and how.

Merging the repos would throw us back ~9 years when controller.git contained all of yangtools, mdsal, netconf, adsal, the works. It would also lower to guards we have against layering/design violations.

One example I can quote here is https://git.opendaylight.org/gerrit/c/openflowplugin/+/91313 -- that patch should have never been approved and it was upgraded SpotBugs which eventually found it. It took three days to correct -- not something we want to do if long-term maintainability is our goal.
Well, this consolidation idea was to ease the development and release work of the kernel components, but if maintainers of the code prefer the way it is today, I am totally cool with that :)


- Replace Karaf/OSGI framework with something more actual to plumb the java code and jars together (e.g. spring).
Right, and odlmicro just dropped the ball here in more than one way :(

The long-term plan here is to have OSGi DS annotations and a reasonable DI framework for static deployments. In this regard Blueprint is one of the worst decisions we have ever made.

In terms of "reasonable framework":

1. we have static Karaf fully supported, but per-use-case packaging has not been appearing. This is easy pickings most of the time: create a static karaf, put into a docker (or whathever) and you are done. It is a no-frills solution, boot time is at ~30% of what dynamic Karaf is.

2. odlmicro is moribund. Contributors are welcome, but if those do not step forward, https://github.com/PANTHEONtech/lighty is very much an alternative, which is not readily integratable into individual projects :(

3. what we *really* want to do is Dagger. That's pretty much the gold standard in Android apps, completely compile-time, all that jazz. While we have the basics ready in some places, noone has prototyped anything reasonable with it.

At this point, I think the idea behind odlmicro should be supplanted by a single goal: make everthing wireable via Dagger and provide a Dagger-based equivalent of netconf.git/static/pom.xml to get that use case up and running.
I may be wrong but I do not think anybody is after odlmicro nowadays. Anyway to me the problem Karaf/OSGI is not that much the boot time (option 1) but the usability and adoption aspect, developers like to write code in modern and well adopted platforms.


3) Infrastructure:
Here is kind of obsolete too. Some ideas to renew the build and test infrastructure:
Build pipeline:
- Move from JJB (not maintained anymore) to Jenkins pipelines or similar (work is ongoing).
Yeah, that's the idea, but I am not aware of anyone actively working on this.

- Move to a continuous release process where a merge in master produces a new release in the artifact repository (staging). This simplifies a lot the release process: just move artifacts from staging to release repository.
This ties in with your comment about consolidating repos. Rather that that, I wish we just had a reasonable infra to automatically release on patch merge -- including version bumps, CSIT validation, proper git history (which we do not have), all that jazz.

There is just no way I can over-sell this -- this is the core piece of automation we are missing. Requires some real DevOps folks, which we seem to be in short supply of these days.

- Every ODL application (NETCONF, BGP, PCEP, TPCE, etc) should generate a container automatically after a merge in master. We should be testing this vs ODL distribution.
Right, and we are building towards this. I think NETCONF is *almost* there, but I am not sure. This is a prerequisite to having maven-stage jobs executing CSIT prior to allowing release, which ties in to the previous point.

System test:
- Robot was the best open source system test framework 10 years back where organizations had developers and system engineers separated. Things have changed a lot since and many agile organizations nowadays have developers doing system integration and system test code apart from writing product features. Robot framework is good for system integrators with basic or non coding skills but bad for developers that have to ramp up in a new language that soon find very limiting. This is why I think at this moment it would be good to switch to something like pytest for example.
Yes.

At the end of the day, CSIT must be completely owned by the project it is testing and it must be reasonably maintainable. Neither int/test organization (we still carry SXP tests?!) nor RF fulfill (please point me to a reasonable IDE, can you?) that criteria :)
IMO int/test should be simply testing 1) non functional aspects like scale, security, etc 2) multiple applications together (not sure if this still makes sense), and 3) hosting common test libraries.


- Leverage K8s to do multi-application and scale testing.
I think int/packaging some overhaul here. At the end of the day, the first use case we need to have is a Docker (or whatever, I don't care) based on netconf.git/static being packaged as part of netconf-maven-merge job. I have no idea what we need to make that happen, though.
Work is ongoing but very slow due to lack of bandwidth contribution. So far we managed to create the infrastructure for dockers and some partial support for helm charts but there is still the work of getting containers down to some projects (not all need docker) and finalize the helm infrastructure work.


Regards,
Robert