Date   

Re: Git workflows

Andrew Grimberg
 

On 8/12/22 13:10, Robert Varga wrote:
On 12/08/2022 19:30, Andrew Grimberg wrote:
Greetings,
Hello,

Coming into this a little late.
and welcome :)
:) Thanks!

On 8/9/22 13:58, Robert Varga wrote:

--<snip>--

1. Support for multiple branches
================================
[snip]

That having been said, I do believe this is fixable by automation, e.g. having a bot assign Change-IDs for a PR and squashing each PR into a single patch -- which then can be projected to Gerrit, allowing for migration. I am not aware of such a bot existing, so I track this as something would have to be contributed.
This is still accurate AFAIK. Change-ID is very much a Gerrit concept which they leverage heavily to understand the status of any given change across the status of all branches.

That all being said, there is the ability to pull GitHub PRs into Gerrit. The problem with this being as follows:

1. It's a manual process by way of a plugin in Gerrit that has to be initiated by someone that is expectings a PR that needs to come in.

2. Change-ID isn't something that is enforced by GitHub, it could be semi-enforced by a GitHub Action, but all that would do is mark the PR as not passing, so it's not true enforcement. GitLab, on the other hand, _could_ enforce this because it's possible to setup a regex filter on commit messages that must be passed to even raise an MR, but there's other downsides to that (including LFRE not having any job integration currently with GitLab and Jenkins, though it's possible).

Doing the Change-ID work by a GitHub Action or bot may be a solution for reflecting changes back into Gerrit, but you're still going to run into some weird edge cases.
I think getting PRs into Gerrit would be a very useful first step. If there is genuine interest, we should see the facility get used for contributions -- and we can hash things out and we use it.
What is the paperwork we need to do to get this rolling? :)
As always, open a support ticket at https://support.linuxfoundation.org

Please be aware that this may take a bit to get working as we haven't set it up ourselves. My experience with it was testing out the workflow on GerritHub [0]. Since the folks that run GerritHub (GerritForge [1]) are the creators and maintainers of the plugin that allows for this workflow to even exist at present.

You can test out the workflow with some personal GitHub repos by logging into GerritHub.

[0] https://gerrithub.io
[1] https://gerritforge.com

--
Andrew J Grimberg
Manager Release Engineering
The Linux Foundation

NOTICE: The Linux Foundation supports their employees with flexible work
hours. If you recieve mail from me outside of standard business hours
please be aware that I do not expect a response until the next standard
business day.


Re: Git workflows

Robert Varga
 

On 12/08/2022 19:30, Andrew Grimberg wrote:
Greetings,
Hello,

Coming into this a little late.
and welcome :)

On 8/9/22 13:58, Robert Varga wrote:
--<snip>--

1. Support for multiple branches
================================
[snip]

That having been said, I do believe this is fixable by automation, e.g. having a bot assign Change-IDs for a PR and squashing each PR into a single patch -- which then can be projected to Gerrit, allowing for migration. I am not aware of such a bot existing, so I track this as something would have to be contributed.
This is still accurate AFAIK. Change-ID is very much a Gerrit concept which they leverage heavily to understand the status of any given change across the status of all branches.
That all being said, there is the ability to pull GitHub PRs into Gerrit. The problem with this being as follows:
1. It's a manual process by way of a plugin in Gerrit that has to be initiated by someone that is expectings a PR that needs to come in.
2. Change-ID isn't something that is enforced by GitHub, it could be semi-enforced by a GitHub Action, but all that would do is mark the PR as not passing, so it's not true enforcement. GitLab, on the other hand, _could_ enforce this because it's possible to setup a regex filter on commit messages that must be passed to even raise an MR, but there's other downsides to that (including LFRE not having any job integration currently with GitLab and Jenkins, though it's possible).
Doing the Change-ID work by a GitHub Action or bot may be a solution for reflecting changes back into Gerrit, but you're still going to run into some weird edge cases.
I think getting PRs into Gerrit would be a very useful first step. If there is genuine interest, we should see the facility get used for contributions -- and we can hash things out and we use it.

What is the paperwork we need to do to get this rolling? :)

Regards,
Robert


Re: Git workflows

Andrew Grimberg
 

Greetings,

Coming into this a little late.

On 8/9/22 13:58, Robert Varga wrote:

--<snip>--

1. Support for multiple branches
================================
It is OpenDaylight policy to support up to 3 branches at any given time for any MSI project. For MRI projects, that number gets to 4 for periods last 2-5 months -- as is the case for YANG tools right now, we have:
- yangtools-7.0.x for 2022.03 Phosphorus security support
- yangtools-8.0.x for 2022.06 Sulfur
- yangtools-9.0.x for 2022.09 Chlorine
- yangtools-master for 2023.03 Argon
As far as I know, Github does not provide the equivalent of Gerrit cherry-picks out of the box. That certainly was the case ~5 years when I investigated this more deeply.
The crux of the issue seems to be Change-ID and its tie-in with GH PRs. I was told by Andy Grimberg this is nigh impossible to reconcile. Change-ID is critical for cross-referencing commits, because equivalent patches can look very differently on each supported branch.
That having been said, I do believe this is fixable by automation, e.g. having a bot assign Change-IDs for a PR and squashing each PR into a single patch -- which then can be projected to Gerrit, allowing for migration. I am not aware of such a bot existing, so I track this as something would have to be contributed.
This is still accurate AFAIK. Change-ID is very much a Gerrit concept which they leverage heavily to understand the status of any given change across the status of all branches.

That all being said, there is the ability to pull GitHub PRs into Gerrit. The problem with this being as follows:

1. It's a manual process by way of a plugin in Gerrit that has to be initiated by someone that is expectings a PR that needs to come in.

2. Change-ID isn't something that is enforced by GitHub, it could be semi-enforced by a GitHub Action, but all that would do is mark the PR as not passing, so it's not true enforcement. GitLab, on the other hand, _could_ enforce this because it's possible to setup a regex filter on commit messages that must be passed to even raise an MR, but there's other downsides to that (including LFRE not having any job integration currently with GitLab and Jenkins, though it's possible).

Doing the Change-ID work by a GitHub Action or bot may be a solution for reflecting changes back into Gerrit, but you're still going to run into some weird edge cases.

2. Permissions
==============
Github is a system external to LF. As such, I do not think there is infrastructure present to project each project's INFO.yaml into Github permissions. AFAICT the only existing thing is the 'OpenDaylight project', which is an all-or-nothing thing. That is something LF IT has to tackle before we consider migrating.
Technically, we have something comprable on the GitHub side using INFO.yaml. However, absolutely no project that we support in GitHub has elected to utilize it as it ended up being harder to work with than the INFO files as they currently exist in repo. Mostly because the implementation pulled it out of the repos themselves and stuck it into a side repo inside the Org. The reasons for this are varied but mostly come down to how easy it was to detect changes to the remote INFO files that needed to be then be shadowed into the LF's releng/info-master repository.

3. Verification
===============
Our current infrastructure is tied to Jenkins. A switch to GH requires that a PR triggers the appropriate jobs in Jenkins. Unless we are talking a straight-up move to GH Actions, we need point 1. to be solved and drive verification projected from Gerrit back to GH. If GH actions are in the picture, at least maven-verify need to be migrated. Again, this needs a community contribution.
LF managed Jenkins already supports GitHub as a source SCM triggering into Jenkins. In point of fact all of the global-jjb core jobs that ODL utilizes have two variants, a Gerrit variant and a GitHub variant. The primary issue being that you can't have both variants active for a given repository at the same time because namespace collisions.

There was an idea floated not long ago internally about if it would be possible to sort of go the other way Gerrit -> GitHub with work still primarily happening in Gerrit (changes raised, etc) but cause Gerrit changes to raise PRs into the GitHub mirror to then trigger GitHub Actions that would then somehow have the results shuttled back to the Gerrit Change.

I believe this would be doable, but nobody has had the time to sit down and evaluate how to actually make it work.

The best scenario would be that changes could be raised on either side (Gerrit or GitHub) and that review itself would just continue to happen in Gerrit along with the final merges. Getting bi-directionality would be a major project though.

--<snip>--

-Andy-

--
Andrew J Grimberg
Manager Release Engineering
The Linux Foundation

NOTICE: The Linux Foundation supports their employees with flexible work
hours. If you recieve mail from me outside of standard business hours
please be aware that I do not expect a response until the next standard
business day.


Re: Git workflows

Guillaume Lambert
 

Hi all

I am quite in line with what  Robert wrote.
It is worth to notice that ONAP already gave a shot on migrating its CI to GitLab
and this was a subsequent effort. You can find more details at these URLs.
https://wiki.onap.org/display/DW/ONAP+CD+on+Gitlab
https://wiki.onap.org/display/DW/Daily+Deployments+and+gating

This is a bit different but it can give an idea of what dealing with such a migration implies.

Hope this helps


Best Regards
Guillaume




De : TSC@... <TSC@...> de la part de Robert Varga <nite@...>
Envoyé : mardi 9 août 2022 22:58
À : tsc@...
Objet : [OpenDaylight TSC] Git workflows
 
Hello,

I am slowly catching up on things from last month. One item is the
subject of Github workflows.

There are a number of unresolved issues, some of which may be my
ignorance of the outside world (in which case I would *love* to be
proven wrong). Here is the list:

1. Support for multiple branches
================================
It is OpenDaylight policy to support up to 3 branches at any given time
for any MSI project. For MRI projects, that number gets to 4 for periods
last 2-5 months -- as is the case for YANG tools right now, we have:
- yangtools-7.0.x for 2022.03 Phosphorus security support
- yangtools-8.0.x for 2022.06 Sulfur
- yangtools-9.0.x for 2022.09 Chlorine
- yangtools-master for 2023.03 Argon

As far as I know, Github does not provide the equivalent of Gerrit
cherry-picks out of the box. That certainly was the case ~5 years when I
investigated this more deeply.

The crux of the issue seems to be Change-ID and its tie-in with GH PRs.
I was told by Andy Grimberg this is nigh impossible to reconcile.
Change-ID is critical for cross-referencing commits, because equivalent
patches can look very differently on each supported branch.

That having been said, I do believe this is fixable by automation, e.g.
having a bot assign Change-IDs for a PR and squashing each PR into a
single patch -- which then can be projected to Gerrit, allowing for
migration. I am not aware of such a bot existing, so I track this as
something would have to be contributed.

2. Permissions
==============
Github is a system external to LF. As such, I do not think there is
infrastructure present to project each project's INFO.yaml into Github
permissions. AFAICT the only existing thing is the 'OpenDaylight
project', which is an all-or-nothing thing. That is something LF IT has
to tackle before we consider migrating.

3. Verification
===============
Our current infrastructure is tied to Jenkins. A switch to GH requires
that a PR triggers the appropriate jobs in Jenkins. Unless we are
talking a straight-up move to GH Actions, we need point 1. to be solved
and drive verification projected from Gerrit back to GH. If GH actions
are in the picture, at least maven-verify need to be migrated. Again,
this needs a community contribution.


Now, I am not a nay-sayer, but we have a sh*tload of things going and
migrating it requires some serious man-power and for those uninitiated,
I like to quote Thanh: "Getting Eclipse build where it is was a twelve
month effort for three dedicated people" (IIRC). We no longer have Thanh
and all his dedication and experience.

I have used this quote when someone proposed moving to Gradle. Moving to
GH workflows is harder.

If we are to tackle this, we need to solve above problems in order: 2,
1, 3. I will lend my support to anyone seriously committed to this
undertaking.

At the end of the day, this is not impossible. The OpenJDK community has
executed a full transition from custom Mercurial workflows (webrev et
al.) to GitHub PRs -- but that transition includes a metric ton of
automation which had to be written from scratch. We as a community are
struggling to get to Jenkins Pipelines, which is dead simple in comparison.

So, Venkat, as the one proposing this change, are you in a position and
willing to drive it to completion?

Regards,
Robert





_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rohini Ambika
 

Hi Rangan,

 

Thanks . Looking forward for your response .

 

Hi @John Mangan , Could you please confirm if our use-case would require the HA in NBI or not. 

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Venkatrangan Govindarajan <gvrangan@...>
Sent: Wednesday, August 10, 2022 12:14 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Rahul Sharma <rahul.iitr@...>; Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [OpenDaylight TSC] [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi Rohini,

 

 I think we connected already, I can take a look at the issue and provide a response next week.

 As we discussed, please check if your use-case would require the HA in NBI or not. 

 We can look at the logged jira ticket and get back to you.

 

Regards,

Rangan

 

புத., 10 ஆக., 2022, பிற்பகல் 12:11 அன்று, Rohini Ambika via lists.opendaylight.org <rohini.ambika=infosys.com@...> எழுதியது:

Hi Rahul,

 

Thanks for your response.

 

We can confirm that the issue persists without K8s when we deploy ODL as a cluster.

 

Could you help us to connect with the ODL clustering team to proceed further.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Tuesday, August 9, 2022 2:07 AM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi Rohini,

 

Sorry, got pulled into other things.

For this issue, we were wondering if it's related to ODL deployed using Helm charts, considering that the problem is also reproducible when ODL is running as a cluster (without K8s). Perhaps the ODL-Clustering team can provide better inputs since the problem looks to be at the application level.

Let me know what you think.

 

Regards,

Rahul

 

 

On Thu, Aug 4, 2022 at 5:55 AM Rohini Ambika <rohini.ambika@...> wrote:

Hello,

 

Did you get a chance to look in to the configurations shared.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rohini Ambika
Sent: Friday, July 29, 2022 11:35 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: RE: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma


 

--

- Rahul Sharma


 

--

Venkatrangan Govindarajan
( When there is no wind...Row )


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Venkatrangan Govindarajan
 

Hi Rohini,

 I think we connected already, I can take a look at the issue and provide a response next week.
 As we discussed, please check if your use-case would require the HA in NBI or not. 
 We can look at the logged jira ticket and get back to you.

Regards,
Rangan

புத., 10 ஆக., 2022, பிற்பகல் 12:11 அன்று, Rohini Ambika via lists.opendaylight.org <rohini.ambika=infosys.com@...> எழுதியது:

Hi Rahul,

 

Thanks for your response.

 

We can confirm that the issue persists without K8s when we deploy ODL as a cluster.

 

Could you help us to connect with the ODL clustering team to proceed further.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Tuesday, August 9, 2022 2:07 AM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi Rohini,

 

Sorry, got pulled into other things.

For this issue, we were wondering if it's related to ODL deployed using Helm charts, considering that the problem is also reproducible when ODL is running as a cluster (without K8s). Perhaps the ODL-Clustering team can provide better inputs since the problem looks to be at the application level.

Let me know what you think.

 

Regards,

Rahul

 

 

On Thu, Aug 4, 2022 at 5:55 AM Rohini Ambika <rohini.ambika@...> wrote:

Hello,

 

Did you get a chance to look in to the configurations shared.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rohini Ambika
Sent: Friday, July 29, 2022 11:35 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: RE: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma


 

--

- Rahul Sharma



--
Venkatrangan Govindarajan
( When there is no wind...Row )


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rohini Ambika
 

Hi Rahul,

 

Thanks for your response.

 

We can confirm that the issue persists without K8s when we deploy ODL as a cluster.

 

Could you help us to connect with the ODL clustering team to proceed further.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Tuesday, August 9, 2022 2:07 AM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi Rohini,

 

Sorry, got pulled into other things.

For this issue, we were wondering if it's related to ODL deployed using Helm charts, considering that the problem is also reproducible when ODL is running as a cluster (without K8s). Perhaps the ODL-Clustering team can provide better inputs since the problem looks to be at the application level.

Let me know what you think.

 

Regards,

Rahul

 

 

On Thu, Aug 4, 2022 at 5:55 AM Rohini Ambika <rohini.ambika@...> wrote:

Hello,

 

Did you get a chance to look in to the configurations shared.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rohini Ambika
Sent: Friday, July 29, 2022 11:35 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: RE: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma


 

--

- Rahul Sharma


Re: Git workflows

Robert Varga
 

On 09/08/2022 22:58, Robert Varga wrote:
Hello,
[snip]

I have used this quote when someone proposed moving to Gradle. Moving to GH workflows is harder.
Regarding Gradle: it was proposed first in 2015 (if memory serves). We are now at a place where MD-SAL Binding codegen is completely independent of Maven and the interface (yangtools' codegen-api) gives complete control over file lifecycle to the driver (e.g. build system plugin).

Gradle/SBT/whatever plugins are now simple, if you just do (and contribute!) the equivalent of yang-maven-plugin (sans yang-maven-plugin-spi, which is unused and going away).

Yeah, there is no abstract support to make that easy, but those bits should be clear when there are 2 plugins.

Regards,
Robert


Re: Git workflows

Robert Varga
 

On 09/08/2022 22:58, Robert Varga wrote:
Hello,
Sorry, a clarification:

Now, I am not a nay-sayer, but we have a sh*tload of things going and migrating it requires some serious man-power and for those uninitiated, I like to quote Thanh: "Getting Eclipse build where it is was a twelve month effort for three dedicated people" (IIRC). We no longer have Thanh and all his dedication and experience.
This cost Eclipse 3MY to execute. The lesson Thanh offered here is this (paraphrasing):

"The problem is you are not done until all workflows execute perfectly. Making an up-front analysis effectively means a full migration, because you have to see what each and every project does. Projects do weird things."

You will not believe that last sentence until you have experienced it. When you do, it will fundamentally change your perspective. Trust me: been there, done that, multiple times. I would never agree before, I cannot over-emphasize now.

Based on my recent (past 24 months) experience, the break down goes down like this:
1. migrate first project: 1 month
2. migrate next 4 projects: 3 months
3. migrate next 5 projects: 1 month
4. migrate all projects: 3 months

There is a distinct 1/5/10/all separation in ODL projects. The reality is you do not know until you have OpenFlowPlugin integrated, but you can come prepared if you've done bgpcep (and we do because it is MRI).

Summed up: the risk is "you do not know what you do not know". The only way to know more is migrate more projects.

Bye,
Robert

P.S.: I will freely admit the most severe single hit this community has taken over all of those years is loosing Thanh. Two reasons: dedication and experience. I hope we will find someone to replace him, the sooner the better.

P.P.S.: It took 4 months to develop a major YANG Tools change. It took another 15 man-months to fully integrate that change. Executed as two steps. Both were a major pain. We are past that now. The mop-up is mostly peanuts.


Re: Git workflows

Robert Varga
 

On 09/08/2022 22:58, Robert Varga wrote:
Hello,
Sorry to self-reply.

[snip]

At the end of the day, this is not impossible. The OpenJDK community has executed a full transition from custom Mercurial workflows (webrev et al.) to GitHub PRs -- but that transition includes a metric ton of automation which had to be written from scratch.
A typical Java PR looks like this: https://github.com/openjdk/jdk/pull/9812

Note that the actual management is done by "openjdk" bot, including
responding to commands by authorized users:

https://github.com/openjdk/jdk/pull/9766#issuecomment-1206152322 has the following workflow results:

1. https://github.com/openjdk/jdk/pull/9766#issuecomment-1206153918
2. @openjdk openjdk bot added the integrated label 5 days ago
3. @openjdk openjdk bot closed this 5 days ago
4. @openjdk openjdk bot removed ready rfr labels 5 days ago
5. https://github.com/openjdk/jdk/pull/9766#issuecomment-1206154128

I am not saying we need all of this (although that would be awesome), but we absolutely require 50% of this.

Note that inter-release OpenJDK requires explicit backports tracked in JIRA like here: https://bugs.openjdk.org/browse/JDK-8274349. That side-steps a workflow across multiple Git branches. I do not believe our community does not have the resources to support that amount of overhead, so we need something better here (which in turn implies Change-ID).

Regards,
Robert


Re: Relevancy of branch locks during the release process

Robert Varga
 

On 09/08/2022 15:44, Guillaume Lambert via lists.opendaylight.org wrote:
Hello
As discussed during the TSC meeting of 7th July <https://wiki.opendaylight.org/display/ODL/2022-07-07+TSC+Minutes>,
I'd like to challenge the relevancy of branch locks during releases processes.
In my opinion they have more cons than pros today.
I agree they used to be meaningful in the past to avoid potential overlaps and incoherence.
But it was a time when many active projects and committers were taking part to the release.
I am not convinced the size of the community still justifies such a process today.
And since most active projects and their committers are quite experienced,
I am quite convinced that branch locks more brake the release than they help.
I think especially of repeated situations such as when downstream projects face
bugs in their upstream dependencies and have to wait for the branch to be unlocked
to update their poms and trigger stage-releases jobs.
But I might have missed some aspects.
So I would like to have other community members'opinion on the topic.
I tend to agree. The branch lock is relevant only to MSI projects at this point. Those projects do not even have what I would call active committers (with the notable exception on OFP).

Meanwhile the way branch-lock works is quite wrong, as it should only be affecting MSI projects (e.g. those in autorelease.git), but affects everyone (who happens to match their branch naming).

Ditch it for all I care.

Regards,
Robert


Git workflows

Robert Varga
 

Hello,

I am slowly catching up on things from last month. One item is the subject of Github workflows.

There are a number of unresolved issues, some of which may be my ignorance of the outside world (in which case I would *love* to be proven wrong). Here is the list:

1. Support for multiple branches
================================
It is OpenDaylight policy to support up to 3 branches at any given time for any MSI project. For MRI projects, that number gets to 4 for periods last 2-5 months -- as is the case for YANG tools right now, we have:
- yangtools-7.0.x for 2022.03 Phosphorus security support
- yangtools-8.0.x for 2022.06 Sulfur
- yangtools-9.0.x for 2022.09 Chlorine
- yangtools-master for 2023.03 Argon

As far as I know, Github does not provide the equivalent of Gerrit cherry-picks out of the box. That certainly was the case ~5 years when I investigated this more deeply.

The crux of the issue seems to be Change-ID and its tie-in with GH PRs. I was told by Andy Grimberg this is nigh impossible to reconcile. Change-ID is critical for cross-referencing commits, because equivalent patches can look very differently on each supported branch.

That having been said, I do believe this is fixable by automation, e.g. having a bot assign Change-IDs for a PR and squashing each PR into a single patch -- which then can be projected to Gerrit, allowing for migration. I am not aware of such a bot existing, so I track this as something would have to be contributed.

2. Permissions
==============
Github is a system external to LF. As such, I do not think there is infrastructure present to project each project's INFO.yaml into Github permissions. AFAICT the only existing thing is the 'OpenDaylight project', which is an all-or-nothing thing. That is something LF IT has to tackle before we consider migrating.

3. Verification
===============
Our current infrastructure is tied to Jenkins. A switch to GH requires that a PR triggers the appropriate jobs in Jenkins. Unless we are talking a straight-up move to GH Actions, we need point 1. to be solved and drive verification projected from Gerrit back to GH. If GH actions are in the picture, at least maven-verify need to be migrated. Again, this needs a community contribution.


Now, I am not a nay-sayer, but we have a sh*tload of things going and migrating it requires some serious man-power and for those uninitiated, I like to quote Thanh: "Getting Eclipse build where it is was a twelve month effort for three dedicated people" (IIRC). We no longer have Thanh and all his dedication and experience.

I have used this quote when someone proposed moving to Gradle. Moving to GH workflows is harder.

If we are to tackle this, we need to solve above problems in order: 2, 1, 3. I will lend my support to anyone seriously committed to this undertaking.

At the end of the day, this is not impossible. The OpenJDK community has executed a full transition from custom Mercurial workflows (webrev et al.) to GitHub PRs -- but that transition includes a metric ton of automation which had to be written from scratch. We as a community are struggling to get to Jenkins Pipelines, which is dead simple in comparison.

So, Venkat, as the one proposing this change, are you in a position and willing to drive it to completion?

Regards,
Robert


TSC Meeting for August 11, 2022 at 9 am Pacific

Guillaume Lambert
 

Hello OpenDaylight Community,
 
The next TSC meeting is August 11, 2022 at 9 am Pacific Time.
As usual, the agenda proposal and the connection details for this meeting are available in the wiki
at the following URL:
 
 
If you need to add anything, please let me know or add it there.
The meeting minutes will be at the same location after the meeting is over.
 
Best Regards
Guillaume

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Relevancy of branch locks during the release process

Guillaume Lambert
 

Hello


As discussed during the TSC meeting of 7th July,
I'd like to challenge the relevancy of branch locks during releases processes.
In my opinion they have more cons than pros today.

I agree they used to be meaningful in the past to avoid potential overlaps and incoherence.

But it was a time when many active projects and committers were taking part to the release.

I am not convinced the size of the community still justifies such a process today.
And since most active projects and their committers are quite experienced,
I am quite convinced that branch locks more brake the release than they help.
I think especially of repeated situations such as when downstream projects face
bugs in their upstream dependencies and have to wait for the branch to be unlocked
to update their poms and trigger stage-releases jobs.

But I might have missed some aspects.
So I would like to have other community members'opinion on the topic.

Best Regards

Guillaume



De : LAMBERT Guillaume INNOV/NET
Envoyé : lundi 8 août 2022 17:45
À : Daniel de la Rosa; TSC
Objet : RE: Code freeze for Sulfur SR2
 

Hi Daniel


You are right.
For the moment, I assume that we can still process as before once again.

Bacuase we didn't close the debate about this point.
I still have to send an email with more details to trigger it.
Most people including myself were on vacation just after this meeting.
I can do it soon now that everyone is back.


Best Regards

Guillaume




De : Daniel de la Rosa <ddelarosa0707@...>
Envoyé : lundi 8 août 2022 17:00:00
À : TSC; LAMBERT Guillaume INNOV/NET
Objet : Code freeze for Sulfur SR2
 
Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: 2022.09 Chlorine MRI status

Robert Varga
 

On 08/08/2022 17:30, Robert Varga wrote:
I have filed patches for all three of these, once they are merged we should see how CSIT goes.
Alright, so now we are in business, we have https://jenkins.opendaylight.org/releng/job/integration-distribution-test-chlorine/90/ going.

It seems all tests are currently failing due to being run with Java 11, now I am not sure where those definitions are...

Regards,
Robert


Re: Code freeze for Sulfur SR2

Daniel de la Rosa
 

Hello Guillaume and all, 

Ok I have already announced the code freeze for Sulfur SR2 but we can review the process at the next TSC meeting. BTW, I'm on vacation this week so I won't be able to attend this week.

Thanks 

On Mon, Aug 8, 2022 at 8:45 AM <guillaume.lambert@...> wrote:

Hi Daniel


You are right.
For the moment, I assume that we can still process as before once again.

Bacuase we didn't close the debate about this point.
I still have to send an email with more details to trigger it.
Most people including myself were on vacation just after this meeting.
I can do it soon now that everyone is back.


Best Regards

Guillaume




De : Daniel de la Rosa <ddelarosa0707@...>
Envoyé : lundi 8 août 2022 17:00:00
À : TSC; LAMBERT Guillaume INNOV/NET
Objet : Code freeze for Sulfur SR2
 
Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Sulfur code freeze for SR2

Daniel de la Rosa
 

Hello TSC and all

We are going to code freeze Sulfur for all Managed Projects ( cut and lock release branches ) on Monday August 15th 2022  at 10 am UTC

Please remember that we only allow blocker bug fixes in release branch after code freezes

Daniel de la Rosa
ODL Release Manager

Thanks

ps. Release schedule and checklist for your reference

--
Daniel de la Rosa
ODL Release Manager


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rahul Sharma <rahul.iitr@...>
 

Hi Rohini,

Sorry, got pulled into other things.
For this issue, we were wondering if it's related to ODL deployed using Helm charts, considering that the problem is also reproducible when ODL is running as a cluster (without K8s). Perhaps the ODL-Clustering team can provide better inputs since the problem looks to be at the application level.
Let me know what you think.

Regards,
Rahul


On Thu, Aug 4, 2022 at 5:55 AM Rohini Ambika <rohini.ambika@...> wrote:

Hello,

 

Did you get a chance to look in to the configurations shared.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rohini Ambika
Sent: Friday, July 29, 2022 11:35 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: RE: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma



--
- Rahul Sharma


Re: OpenDaylight Survey - Draft

PMX
 

Great initiative, Filip! It will certainly provide some initial indication of how the community is using ODL.

We usually use SurveyMonkey to conduct such surveys.

Cheers!

Pano

On Mon, Aug 8, 2022 at 3:40 AM Filip Čúzy <filip.cuzy@...> wrote:

Hi TSC, (+ Pano on CC)


as discussed on the last meeting, I would like to propose a survey for OpenDaylight users, in order to get a better angle on our community and users of OpenDaylight.


Feel free to think about any additional questions that might be suitable for the survey. Do keep in mind, that it should be shorter, in order to encourage users to answer.


In order to visualize it better, I created a mockup in a tool called Tally: https://tally.so/r/3xXyDd (Most questions are marked as required, except for one optional field)


The ideal outcome would be 50+ responses and a sort of 1-pager, discussing what the results indicate for the project.


@Pano: Does LFN have some kind of tool where we could create the survey and validate them? I will gladly take part in creating the output of this survey with you.


Looking forward to any feedback,


Filip Čúzy

Marketing Specialist

 

PANTHEON .tech

Mlynské Nivy 56, 821 05 Bratislava

Slovakia

Tel / +421 220 665 111

 

MAIL / filip.cuzy@...

WEB / https://pantheon.tech


Re: Code freeze for Sulfur SR2

Guillaume Lambert
 

Hi Daniel


You are right.
For the moment, I assume that we can still process as before once again.

Bacuase we didn't close the debate about this point.
I still have to send an email with more details to trigger it.
Most people including myself were on vacation just after this meeting.
I can do it soon now that everyone is back.


Best Regards

Guillaume




De : Daniel de la Rosa <ddelarosa0707@...>
Envoyé : lundi 8 août 2022 17:00:00
À : TSC; LAMBERT Guillaume INNOV/NET
Objet : Code freeze for Sulfur SR2
 
Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

81 - 100 of 14323