Date   

Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rohini Ambika <rohini.ambika@...>
 

Hi Rahul,

 

Thanks for your response.

 

We can confirm that the issue persists without K8s when we deploy ODL as a cluster.

 

Could you help us to connect with the ODL clustering team to proceed further.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Tuesday, August 9, 2022 2:07 AM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi Rohini,

 

Sorry, got pulled into other things.

For this issue, we were wondering if it's related to ODL deployed using Helm charts, considering that the problem is also reproducible when ODL is running as a cluster (without K8s). Perhaps the ODL-Clustering team can provide better inputs since the problem looks to be at the application level.

Let me know what you think.

 

Regards,

Rahul

 

 

On Thu, Aug 4, 2022 at 5:55 AM Rohini Ambika <rohini.ambika@...> wrote:

Hello,

 

Did you get a chance to look in to the configurations shared.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rohini Ambika
Sent: Friday, July 29, 2022 11:35 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: RE: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma


 

--

- Rahul Sharma


Re: Git workflows

Robert Varga
 

On 09/08/2022 22:58, Robert Varga wrote:
Hello,
[snip]

I have used this quote when someone proposed moving to Gradle. Moving to GH workflows is harder.
Regarding Gradle: it was proposed first in 2015 (if memory serves). We are now at a place where MD-SAL Binding codegen is completely independent of Maven and the interface (yangtools' codegen-api) gives complete control over file lifecycle to the driver (e.g. build system plugin).

Gradle/SBT/whatever plugins are now simple, if you just do (and contribute!) the equivalent of yang-maven-plugin (sans yang-maven-plugin-spi, which is unused and going away).

Yeah, there is no abstract support to make that easy, but those bits should be clear when there are 2 plugins.

Regards,
Robert


Re: Git workflows

Robert Varga
 

On 09/08/2022 22:58, Robert Varga wrote:
Hello,
Sorry, a clarification:

Now, I am not a nay-sayer, but we have a sh*tload of things going and migrating it requires some serious man-power and for those uninitiated, I like to quote Thanh: "Getting Eclipse build where it is was a twelve month effort for three dedicated people" (IIRC). We no longer have Thanh and all his dedication and experience.
This cost Eclipse 3MY to execute. The lesson Thanh offered here is this (paraphrasing):

"The problem is you are not done until all workflows execute perfectly. Making an up-front analysis effectively means a full migration, because you have to see what each and every project does. Projects do weird things."

You will not believe that last sentence until you have experienced it. When you do, it will fundamentally change your perspective. Trust me: been there, done that, multiple times. I would never agree before, I cannot over-emphasize now.

Based on my recent (past 24 months) experience, the break down goes down like this:
1. migrate first project: 1 month
2. migrate next 4 projects: 3 months
3. migrate next 5 projects: 1 month
4. migrate all projects: 3 months

There is a distinct 1/5/10/all separation in ODL projects. The reality is you do not know until you have OpenFlowPlugin integrated, but you can come prepared if you've done bgpcep (and we do because it is MRI).

Summed up: the risk is "you do not know what you do not know". The only way to know more is migrate more projects.

Bye,
Robert

P.S.: I will freely admit the most severe single hit this community has taken over all of those years is loosing Thanh. Two reasons: dedication and experience. I hope we will find someone to replace him, the sooner the better.

P.P.S.: It took 4 months to develop a major YANG Tools change. It took another 15 man-months to fully integrate that change. Executed as two steps. Both were a major pain. We are past that now. The mop-up is mostly peanuts.


Re: Git workflows

Robert Varga
 

On 09/08/2022 22:58, Robert Varga wrote:
Hello,
Sorry to self-reply.

[snip]

At the end of the day, this is not impossible. The OpenJDK community has executed a full transition from custom Mercurial workflows (webrev et al.) to GitHub PRs -- but that transition includes a metric ton of automation which had to be written from scratch.
A typical Java PR looks like this: https://github.com/openjdk/jdk/pull/9812

Note that the actual management is done by "openjdk" bot, including
responding to commands by authorized users:

https://github.com/openjdk/jdk/pull/9766#issuecomment-1206152322 has the following workflow results:

1. https://github.com/openjdk/jdk/pull/9766#issuecomment-1206153918
2. @openjdk openjdk bot added the integrated label 5 days ago
3. @openjdk openjdk bot closed this 5 days ago
4. @openjdk openjdk bot removed ready rfr labels 5 days ago
5. https://github.com/openjdk/jdk/pull/9766#issuecomment-1206154128

I am not saying we need all of this (although that would be awesome), but we absolutely require 50% of this.

Note that inter-release OpenJDK requires explicit backports tracked in JIRA like here: https://bugs.openjdk.org/browse/JDK-8274349. That side-steps a workflow across multiple Git branches. I do not believe our community does not have the resources to support that amount of overhead, so we need something better here (which in turn implies Change-ID).

Regards,
Robert


Re: Relevancy of branch locks during the release process

Robert Varga
 

On 09/08/2022 15:44, Guillaume Lambert via lists.opendaylight.org wrote:
Hello
As discussed during the TSC meeting of 7th July <https://wiki.opendaylight.org/display/ODL/2022-07-07+TSC+Minutes>,
I'd like to challenge the relevancy of branch locks during releases processes.
In my opinion they have more cons than pros today.
I agree they used to be meaningful in the past to avoid potential overlaps and incoherence.
But it was a time when many active projects and committers were taking part to the release.
I am not convinced the size of the community still justifies such a process today.
And since most active projects and their committers are quite experienced,
I am quite convinced that branch locks more brake the release than they help.
I think especially of repeated situations such as when downstream projects face
bugs in their upstream dependencies and have to wait for the branch to be unlocked
to update their poms and trigger stage-releases jobs.
But I might have missed some aspects.
So I would like to have other community members'opinion on the topic.
I tend to agree. The branch lock is relevant only to MSI projects at this point. Those projects do not even have what I would call active committers (with the notable exception on OFP).

Meanwhile the way branch-lock works is quite wrong, as it should only be affecting MSI projects (e.g. those in autorelease.git), but affects everyone (who happens to match their branch naming).

Ditch it for all I care.

Regards,
Robert


Git workflows

Robert Varga
 

Hello,

I am slowly catching up on things from last month. One item is the subject of Github workflows.

There are a number of unresolved issues, some of which may be my ignorance of the outside world (in which case I would *love* to be proven wrong). Here is the list:

1. Support for multiple branches
================================
It is OpenDaylight policy to support up to 3 branches at any given time for any MSI project. For MRI projects, that number gets to 4 for periods last 2-5 months -- as is the case for YANG tools right now, we have:
- yangtools-7.0.x for 2022.03 Phosphorus security support
- yangtools-8.0.x for 2022.06 Sulfur
- yangtools-9.0.x for 2022.09 Chlorine
- yangtools-master for 2023.03 Argon

As far as I know, Github does not provide the equivalent of Gerrit cherry-picks out of the box. That certainly was the case ~5 years when I investigated this more deeply.

The crux of the issue seems to be Change-ID and its tie-in with GH PRs. I was told by Andy Grimberg this is nigh impossible to reconcile. Change-ID is critical for cross-referencing commits, because equivalent patches can look very differently on each supported branch.

That having been said, I do believe this is fixable by automation, e.g. having a bot assign Change-IDs for a PR and squashing each PR into a single patch -- which then can be projected to Gerrit, allowing for migration. I am not aware of such a bot existing, so I track this as something would have to be contributed.

2. Permissions
==============
Github is a system external to LF. As such, I do not think there is infrastructure present to project each project's INFO.yaml into Github permissions. AFAICT the only existing thing is the 'OpenDaylight project', which is an all-or-nothing thing. That is something LF IT has to tackle before we consider migrating.

3. Verification
===============
Our current infrastructure is tied to Jenkins. A switch to GH requires that a PR triggers the appropriate jobs in Jenkins. Unless we are talking a straight-up move to GH Actions, we need point 1. to be solved and drive verification projected from Gerrit back to GH. If GH actions are in the picture, at least maven-verify need to be migrated. Again, this needs a community contribution.


Now, I am not a nay-sayer, but we have a sh*tload of things going and migrating it requires some serious man-power and for those uninitiated, I like to quote Thanh: "Getting Eclipse build where it is was a twelve month effort for three dedicated people" (IIRC). We no longer have Thanh and all his dedication and experience.

I have used this quote when someone proposed moving to Gradle. Moving to GH workflows is harder.

If we are to tackle this, we need to solve above problems in order: 2, 1, 3. I will lend my support to anyone seriously committed to this undertaking.

At the end of the day, this is not impossible. The OpenJDK community has executed a full transition from custom Mercurial workflows (webrev et al.) to GitHub PRs -- but that transition includes a metric ton of automation which had to be written from scratch. We as a community are struggling to get to Jenkins Pipelines, which is dead simple in comparison.

So, Venkat, as the one proposing this change, are you in a position and willing to drive it to completion?

Regards,
Robert


TSC Meeting for August 11, 2022 at 9 am Pacific

Guillaume Lambert
 

Hello OpenDaylight Community,
 
The next TSC meeting is August 11, 2022 at 9 am Pacific Time.
As usual, the agenda proposal and the connection details for this meeting are available in the wiki
at the following URL:
 
 
If you need to add anything, please let me know or add it there.
The meeting minutes will be at the same location after the meeting is over.
 
Best Regards
Guillaume

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Relevancy of branch locks during the release process

Guillaume Lambert
 

Hello


As discussed during the TSC meeting of 7th July,
I'd like to challenge the relevancy of branch locks during releases processes.
In my opinion they have more cons than pros today.

I agree they used to be meaningful in the past to avoid potential overlaps and incoherence.

But it was a time when many active projects and committers were taking part to the release.

I am not convinced the size of the community still justifies such a process today.
And since most active projects and their committers are quite experienced,
I am quite convinced that branch locks more brake the release than they help.
I think especially of repeated situations such as when downstream projects face
bugs in their upstream dependencies and have to wait for the branch to be unlocked
to update their poms and trigger stage-releases jobs.

But I might have missed some aspects.
So I would like to have other community members'opinion on the topic.

Best Regards

Guillaume



De : LAMBERT Guillaume INNOV/NET
Envoyé : lundi 8 août 2022 17:45
À : Daniel de la Rosa; TSC
Objet : RE: Code freeze for Sulfur SR2
 

Hi Daniel


You are right.
For the moment, I assume that we can still process as before once again.

Bacuase we didn't close the debate about this point.
I still have to send an email with more details to trigger it.
Most people including myself were on vacation just after this meeting.
I can do it soon now that everyone is back.


Best Regards

Guillaume




De : Daniel de la Rosa <ddelarosa0707@...>
Envoyé : lundi 8 août 2022 17:00:00
À : TSC; LAMBERT Guillaume INNOV/NET
Objet : Code freeze for Sulfur SR2
 
Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: 2022.09 Chlorine MRI status

Robert Varga
 

On 08/08/2022 17:30, Robert Varga wrote:
I have filed patches for all three of these, once they are merged we should see how CSIT goes.
Alright, so now we are in business, we have https://jenkins.opendaylight.org/releng/job/integration-distribution-test-chlorine/90/ going.

It seems all tests are currently failing due to being run with Java 11, now I am not sure where those definitions are...

Regards,
Robert


Re: Code freeze for Sulfur SR2

Daniel de la Rosa
 

Hello Guillaume and all, 

Ok I have already announced the code freeze for Sulfur SR2 but we can review the process at the next TSC meeting. BTW, I'm on vacation this week so I won't be able to attend this week.

Thanks 

On Mon, Aug 8, 2022 at 8:45 AM <guillaume.lambert@...> wrote:

Hi Daniel


You are right.
For the moment, I assume that we can still process as before once again.

Bacuase we didn't close the debate about this point.
I still have to send an email with more details to trigger it.
Most people including myself were on vacation just after this meeting.
I can do it soon now that everyone is back.


Best Regards

Guillaume




De : Daniel de la Rosa <ddelarosa0707@...>
Envoyé : lundi 8 août 2022 17:00:00
À : TSC; LAMBERT Guillaume INNOV/NET
Objet : Code freeze for Sulfur SR2
 
Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Sulfur code freeze for SR2

Daniel de la Rosa
 

Hello TSC and all

We are going to code freeze Sulfur for all Managed Projects ( cut and lock release branches ) on Monday August 15th 2022  at 10 am UTC

Please remember that we only allow blocker bug fixes in release branch after code freezes

Daniel de la Rosa
ODL Release Manager

Thanks

ps. Release schedule and checklist for your reference

--
Daniel de la Rosa
ODL Release Manager


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rahul Sharma <rahul.iitr@...>
 

Hi Rohini,

Sorry, got pulled into other things.
For this issue, we were wondering if it's related to ODL deployed using Helm charts, considering that the problem is also reproducible when ODL is running as a cluster (without K8s). Perhaps the ODL-Clustering team can provide better inputs since the problem looks to be at the application level.
Let me know what you think.

Regards,
Rahul


On Thu, Aug 4, 2022 at 5:55 AM Rohini Ambika <rohini.ambika@...> wrote:

Hello,

 

Did you get a chance to look in to the configurations shared.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rohini Ambika
Sent: Friday, July 29, 2022 11:35 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: RE: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma



--
- Rahul Sharma


Re: OpenDaylight Survey - Draft

PMX
 

Great initiative, Filip! It will certainly provide some initial indication of how the community is using ODL.

We usually use SurveyMonkey to conduct such surveys.

Cheers!

Pano

On Mon, Aug 8, 2022 at 3:40 AM Filip Čúzy <filip.cuzy@...> wrote:

Hi TSC, (+ Pano on CC)


as discussed on the last meeting, I would like to propose a survey for OpenDaylight users, in order to get a better angle on our community and users of OpenDaylight.


Feel free to think about any additional questions that might be suitable for the survey. Do keep in mind, that it should be shorter, in order to encourage users to answer.


In order to visualize it better, I created a mockup in a tool called Tally: https://tally.so/r/3xXyDd (Most questions are marked as required, except for one optional field)


The ideal outcome would be 50+ responses and a sort of 1-pager, discussing what the results indicate for the project.


@Pano: Does LFN have some kind of tool where we could create the survey and validate them? I will gladly take part in creating the output of this survey with you.


Looking forward to any feedback,


Filip Čúzy

Marketing Specialist

 

PANTHEON .tech

Mlynské Nivy 56, 821 05 Bratislava

Slovakia

Tel / +421 220 665 111

 

MAIL / filip.cuzy@...

WEB / https://pantheon.tech


Re: Code freeze for Sulfur SR2

Guillaume Lambert
 

Hi Daniel


You are right.
For the moment, I assume that we can still process as before once again.

Bacuase we didn't close the debate about this point.
I still have to send an email with more details to trigger it.
Most people including myself were on vacation just after this meeting.
I can do it soon now that everyone is back.


Best Regards

Guillaume




De : Daniel de la Rosa <ddelarosa0707@...>
Envoyé : lundi 8 août 2022 17:00:00
À : TSC; LAMBERT Guillaume INNOV/NET
Objet : Code freeze for Sulfur SR2
 
Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: 2022.09 Chlorine MRI status

Robert Varga
 

On 07/08/2022 18:36, Robert Varga wrote:
On 07/07/2022 01:35, Robert Varga wrote:
Hello everyone,

Since we are well in the 2022.09 Simultaneous Release (Chlorine), here is a quick summary of where we are at:

- MRI projects up to and including AAA have released
- MSI projects have preliminary patches staged at https://git.opendaylight.org/gerrit/q/topic:chlorine-mri
- NETCONF is awaiting a bug scrub and the corresponding release. There are quite a few issues to scrub and we also need some amount of code reorg withing the repo, which in itself may entail breaking changes. There are quite a few unreviewed patches pendign as well. Given the raging summer in the northern hemosphere, I expect netconf-4.0.0 release to happen in about 2-3 weeks' time (i.e. last week of July 2022)
- BGPCEP has a few deliverables yet to be finished and the corresponding 0.18.0 release being dependent on NETCONF, my working assumption is having the release available mid-August 2022
All this has been completed and the MSI projects have been updated. Unfortunately there are three more things blocking autorelease:
- int/dist's master needs to be built with Java 17
Done, managed distribution is out there and it shrunk by ~8MiB, which is good news.

- we need a centos8-4c-16g builder-
Done, we have AR working here: https://jenkins.opendaylight.org/releng/job/autorelease-release-chlorine-mvn38-openjdk17/buildTimeTrend

- we need to remove Java 11-based autorelease-release-chlorine
We still need https://git.opendaylight.org/gerrit/c/releng/builder/+/101980 merged, because ...

I have filed patches for all three of these, once they are merged we should see how CSIT goes.
... AR-openjdk17 is not triggering CSIT without it.

Regards,
Robert


Code freeze for Sulfur SR2

Daniel de la Rosa
 

Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager


OpenDaylight Survey - Draft

Filip Sterling
 

Hi TSC, (+ Pano on CC)


as discussed on the last meeting, I would like to propose a survey for OpenDaylight users, in order to get a better angle on our community and users of OpenDaylight.


Feel free to think about any additional questions that might be suitable for the survey. Do keep in mind, that it should be shorter, in order to encourage users to answer.


In order to visualize it better, I created a mockup in a tool called Tally: https://tally.so/r/3xXyDd (Most questions are marked as required, except for one optional field)


The ideal outcome would be 50+ responses and a sort of 1-pager, discussing what the results indicate for the project.


@Pano: Does LFN have some kind of tool where we could create the survey and validate them? I will gladly take part in creating the output of this survey with you.


Looking forward to any feedback,


Filip Čúzy

Marketing Specialist

 

PANTHEON .tech

Mlynské Nivy 56, 821 05 Bratislava

Slovakia

Tel / +421 220 665 111

 

MAIL / filip.cuzy@...

WEB / https://pantheon.tech


Re: 2022.09 Chlorine MRI status

Robert Varga
 

On 07/07/2022 01:35, Robert Varga wrote:
Hello everyone,
Since we are well in the 2022.09 Simultaneous Release (Chlorine), here is a quick summary of where we are at:
- MRI projects up to and including AAA have released
- MSI projects have preliminary patches staged at https://git.opendaylight.org/gerrit/q/topic:chlorine-mri
- NETCONF is awaiting a bug scrub and the corresponding release. There are quite a few issues to scrub and we also need some amount of code reorg withing the repo, which in itself may entail breaking changes. There are quite a few unreviewed patches pendign as well. Given the raging summer in the northern hemosphere, I expect netconf-4.0.0 release to happen in about 2-3 weeks' time (i.e. last week of July 2022)
- BGPCEP has a few deliverables yet to be finished and the corresponding 0.18.0 release being dependent on NETCONF, my working assumption is having the release available mid-August 2022
All this has been completed and the MSI projects have been updated. Unfortunately there are three more things blocking autorelease:
- int/dist's master needs to be built with Java 17
- we need a centos8-4c-16g builder-
- we need to remove Java 11-based autorelease-release-chlorine

I have filed patches for all three of these, once they are merged we should see how CSIT goes.

All the patches related to this effort are staged at https://git.opendaylight.org/gerrit/q/topic:chlorine-mri, as usual.

Regards,
Robert

P.S. The docs patches should be finished in the next few days.


TSC Meeting for August 4, 2022 at 10 pm Pacific

Guillaume Lambert
 

Hello OpenDaylight Community,

 

The next TSC meeting is August 4, 2022 at 10 pm Pacific Time.

As usual, the agenda proposal and the connection details for this meeting are available in the wiki

at the following URL:

 

https://wiki.opendaylight.org/x/YwGdAQ

If you need to add anything, please let me know or add it there.

The meeting minutes will be at the same location after the meeting is over.

 

Best Regards

Guillaume

 

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rohini Ambika <rohini.ambika@...>
 

Hello,

 

Did you get a chance to look in to the configurations shared.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rohini Ambika
Sent: Friday, July 29, 2022 11:35 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: RE: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma