Date   

Re: Code freeze for Sulfur SR2

Guillaume Lambert
 

Hi Daniel


You are right.
For the moment, I assume that we can still process as before once again.

Bacuase we didn't close the debate about this point.
I still have to send an email with more details to trigger it.
Most people including myself were on vacation just after this meeting.
I can do it soon now that everyone is back.


Best Regards

Guillaume




De : Daniel de la Rosa <ddelarosa0707@...>
Envoyé : lundi 8 août 2022 17:00:00
À : TSC; LAMBERT Guillaume INNOV/NET
Objet : Code freeze for Sulfur SR2
 
Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: 2022.09 Chlorine MRI status

Robert Varga
 

On 07/08/2022 18:36, Robert Varga wrote:
On 07/07/2022 01:35, Robert Varga wrote:
Hello everyone,

Since we are well in the 2022.09 Simultaneous Release (Chlorine), here is a quick summary of where we are at:

- MRI projects up to and including AAA have released
- MSI projects have preliminary patches staged at https://git.opendaylight.org/gerrit/q/topic:chlorine-mri
- NETCONF is awaiting a bug scrub and the corresponding release. There are quite a few issues to scrub and we also need some amount of code reorg withing the repo, which in itself may entail breaking changes. There are quite a few unreviewed patches pendign as well. Given the raging summer in the northern hemosphere, I expect netconf-4.0.0 release to happen in about 2-3 weeks' time (i.e. last week of July 2022)
- BGPCEP has a few deliverables yet to be finished and the corresponding 0.18.0 release being dependent on NETCONF, my working assumption is having the release available mid-August 2022
All this has been completed and the MSI projects have been updated. Unfortunately there are three more things blocking autorelease:
- int/dist's master needs to be built with Java 17
Done, managed distribution is out there and it shrunk by ~8MiB, which is good news.

- we need a centos8-4c-16g builder-
Done, we have AR working here: https://jenkins.opendaylight.org/releng/job/autorelease-release-chlorine-mvn38-openjdk17/buildTimeTrend

- we need to remove Java 11-based autorelease-release-chlorine
We still need https://git.opendaylight.org/gerrit/c/releng/builder/+/101980 merged, because ...

I have filed patches for all three of these, once they are merged we should see how CSIT goes.
... AR-openjdk17 is not triggering CSIT without it.

Regards,
Robert


Code freeze for Sulfur SR2

Daniel de la Rosa
 

Hello Guillaume and all,  I just remembered that we wanted to shorten the code freeze for new releases, as it is documented in our TSC meeting minutes from July 7th. So based on this, when do we want to announce and/or start the code freeze for Sulfur SR2? 

Please let me know here or directly in the checklist

--
Daniel de la Rosa
ODL Release Manager


OpenDaylight Survey - Draft

Filip Sterling
 

Hi TSC, (+ Pano on CC)


as discussed on the last meeting, I would like to propose a survey for OpenDaylight users, in order to get a better angle on our community and users of OpenDaylight.


Feel free to think about any additional questions that might be suitable for the survey. Do keep in mind, that it should be shorter, in order to encourage users to answer.


In order to visualize it better, I created a mockup in a tool called Tally: https://tally.so/r/3xXyDd (Most questions are marked as required, except for one optional field)


The ideal outcome would be 50+ responses and a sort of 1-pager, discussing what the results indicate for the project.


@Pano: Does LFN have some kind of tool where we could create the survey and validate them? I will gladly take part in creating the output of this survey with you.


Looking forward to any feedback,


Filip Čúzy

Marketing Specialist

 

PANTHEON .tech

Mlynské Nivy 56, 821 05 Bratislava

Slovakia

Tel / +421 220 665 111

 

MAIL / filip.cuzy@...

WEB / https://pantheon.tech


Re: 2022.09 Chlorine MRI status

Robert Varga
 

On 07/07/2022 01:35, Robert Varga wrote:
Hello everyone,
Since we are well in the 2022.09 Simultaneous Release (Chlorine), here is a quick summary of where we are at:
- MRI projects up to and including AAA have released
- MSI projects have preliminary patches staged at https://git.opendaylight.org/gerrit/q/topic:chlorine-mri
- NETCONF is awaiting a bug scrub and the corresponding release. There are quite a few issues to scrub and we also need some amount of code reorg withing the repo, which in itself may entail breaking changes. There are quite a few unreviewed patches pendign as well. Given the raging summer in the northern hemosphere, I expect netconf-4.0.0 release to happen in about 2-3 weeks' time (i.e. last week of July 2022)
- BGPCEP has a few deliverables yet to be finished and the corresponding 0.18.0 release being dependent on NETCONF, my working assumption is having the release available mid-August 2022
All this has been completed and the MSI projects have been updated. Unfortunately there are three more things blocking autorelease:
- int/dist's master needs to be built with Java 17
- we need a centos8-4c-16g builder-
- we need to remove Java 11-based autorelease-release-chlorine

I have filed patches for all three of these, once they are merged we should see how CSIT goes.

All the patches related to this effort are staged at https://git.opendaylight.org/gerrit/q/topic:chlorine-mri, as usual.

Regards,
Robert

P.S. The docs patches should be finished in the next few days.


TSC Meeting for August 4, 2022 at 10 pm Pacific

Guillaume Lambert
 

Hello OpenDaylight Community,

 

The next TSC meeting is August 4, 2022 at 10 pm Pacific Time.

As usual, the agenda proposal and the connection details for this meeting are available in the wiki

at the following URL:

 

https://wiki.opendaylight.org/x/YwGdAQ

If you need to add anything, please let me know or add it there.

The meeting minutes will be at the same location after the meeting is over.

 

Best Regards

Guillaume

 

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rohini Ambika
 

Hello,

 

Did you get a chance to look in to the configurations shared.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rohini Ambika
Sent: Friday, July 29, 2022 11:35 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: RE: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma


Re: TransportPCE evolution

Anil Belur
 

+1

On Wed, Aug 3, 2022 at 1:43 AM Guillaume Lambert via lists.opendaylight.org <guillaume.lambert=orange.com@...> wrote:

Hello all


Gilles, I understand you need a formal approval of the TSC to proceed with LFIT. 
So I just created a poll at this URL https://wiki.opendaylight.org/x/MAGdAQ
Please TSC members, can you give your feedback about this?

Thanks in advance

Best Regards

Guillaume


De : app-dev@... <app-dev@...> de la part de Gilles Thouenon via lists.opendaylight.org <gilles.thouenon=orange.com@...>
Envoyé : lundi 1 août 2022 15:08:01
À : TSC
Cc : transportpce-dev@...; LAMBERT Guillaume INNOV/NET; OLLIVIER Cédric INNOV/NET
Objet : [app-dev] TransportPCE evolution
 

Dear TSC members,

 

During the TSC meeting of June 30, Guillaume Lambert briefly presented to you our proposal to make TransportPCE project structure evolve. The purpose of this email is to summarize the evolution that we wish to implement from the Chlorine release, for validation by the TSC.

 

Currently, the TransportPCE project implements data models from the OpenROADM community and from ONF (T-API). All these models are systematically compiled at the beginning of the project build step, which somewhere is useless.

This step already takes a lot of time, and will increase even more because in the medium term the project is moving towards implementing new additional models (openconfig).

Moreover, past experience, especially with the migration to Sulfur, shows us that a number of problems/regressions are directly related to yang models. Having visibility as quickly as possible on the compilation of all these models when core projects such as yangtools or mdsal evolve, would probably help to quickly detect possible bugs during major evolutions.

This is the reason why we would like to export the compilation step of these official models to a new dedicated project (transportpce-models) which would have its own gerrit repo (see my request for a new repo IT-24286).
With each new release (of models and ODL), the models would be compiled once and for all, and the TransportPCE project would use them by simple maven dependency.
These models could if necessary be reused by another project (typically the node simulator used for our functional tests which implements part of these models), and also be used in the CI of yangtools/mdsal/etc.

 

Thank you in advance to the TSC for agreeing to give us its agreement to implement this change, and if necessary its recommendations if there are any.

 

Gilles Thouenon

TransportPCE PTL

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


OpenDaylight Website - Static transformation progress

Filip Sterling
 

Hi TSC,


the beta version of the website transformation, from WordPress to Jekyll, is prepared for a wider demonstration towards you.



Please do keep in mind, that this is purely to showcase how the website could look like. I still have to create the directory of companies which provide ODL services, and more.

I will definitely talk to all of you about on the upcoming meeting.

But the grounds have been built. I am looking forward to everyone's feedback.


Filip Čúzy

Marketing Specialist

 

PANTHEON .tech

Mlynské Nivy 56, 821 05 Bratislava

Slovakia

Tel / +421 220 665 111

 

MAIL / filip.cuzy@...

WEB / https://pantheon.tech


Re: TransportPCE evolution

Guillaume Lambert
 

Hello all


Gilles, I understand you need a formal approval of the TSC to proceed with LFIT. 
So I just created a poll at this URL https://wiki.opendaylight.org/x/MAGdAQ
Please TSC members, can you give your feedback about this?

Thanks in advance

Best Regards

Guillaume


De : app-dev@... <app-dev@...> de la part de Gilles Thouenon via lists.opendaylight.org <gilles.thouenon=orange.com@...>
Envoyé : lundi 1 août 2022 15:08:01
À : TSC
Cc : transportpce-dev@...; LAMBERT Guillaume INNOV/NET; OLLIVIER Cédric INNOV/NET
Objet : [app-dev] TransportPCE evolution
 

Dear TSC members,

 

During the TSC meeting of June 30, Guillaume Lambert briefly presented to you our proposal to make TransportPCE project structure evolve. The purpose of this email is to summarize the evolution that we wish to implement from the Chlorine release, for validation by the TSC.

 

Currently, the TransportPCE project implements data models from the OpenROADM community and from ONF (T-API). All these models are systematically compiled at the beginning of the project build step, which somewhere is useless.

This step already takes a lot of time, and will increase even more because in the medium term the project is moving towards implementing new additional models (openconfig).

Moreover, past experience, especially with the migration to Sulfur, shows us that a number of problems/regressions are directly related to yang models. Having visibility as quickly as possible on the compilation of all these models when core projects such as yangtools or mdsal evolve, would probably help to quickly detect possible bugs during major evolutions.

This is the reason why we would like to export the compilation step of these official models to a new dedicated project (transportpce-models) which would have its own gerrit repo (see my request for a new repo IT-24286).
With each new release (of models and ODL), the models would be compiled once and for all, and the TransportPCE project would use them by simple maven dependency.
These models could if necessary be reused by another project (typically the node simulator used for our functional tests which implements part of these models), and also be used in the CI of yangtools/mdsal/etc.

 

Thank you in advance to the TSC for agreeing to give us its agreement to implement this change, and if necessary its recommendations if there are any.

 

Gilles Thouenon

TransportPCE PTL

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


TransportPCE evolution

Gilles Thouenon
 

Dear TSC members,

 

During the TSC meeting of June 30, Guillaume Lambert briefly presented to you our proposal to make TransportPCE project structure evolve. The purpose of this email is to summarize the evolution that we wish to implement from the Chlorine release, for validation by the TSC.

 

Currently, the TransportPCE project implements data models from the OpenROADM community and from ONF (T-API). All these models are systematically compiled at the beginning of the project build step, which somewhere is useless.

This step already takes a lot of time, and will increase even more because in the medium term the project is moving towards implementing new additional models (openconfig).

Moreover, past experience, especially with the migration to Sulfur, shows us that a number of problems/regressions are directly related to yang models. Having visibility as quickly as possible on the compilation of all these models when core projects such as yangtools or mdsal evolve, would probably help to quickly detect possible bugs during major evolutions.

This is the reason why we would like to export the compilation step of these official models to a new dedicated project (transportpce-models) which would have its own gerrit repo (see my request for a new repo IT-24286).
With each new release (of models and ODL), the models would be compiled once and for all, and the TransportPCE project would use them by simple maven dependency.
These models could if necessary be reused by another project (typically the node simulator used for our functional tests which implements part of these models), and also be used in the CI of yangtools/mdsal/etc.

 

Thank you in advance to the TSC for agreeing to give us its agreement to implement this change, and if necessary its recommendations if there are any.

 

Gilles Thouenon

TransportPCE PTL

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rohini Ambika
 

Hello Rahul,

 

Please find the answers below:

 

  1. Official Helm chart @ ODL Helm Chart . Attaching the values.yml for reference
  2. Fix was to restart the Owner Supervisor on failure . Check-in @ https://git.opendaylight.org/gerrit/c/controller/+/100357

 

We observed the same problem when tested without K8s set up by following the instructions @ https://docs.opendaylight.org/en/stable-phosphorus/getting-started-guide/clustering.html. Instead of installing odl-mdsal-distributed-datastore feature, we have enabled the features given in the values.yml.

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Rahul Sharma <rahul.iitr@...>
Sent: Thursday, July 28, 2022 9:32 PM
To: Rohini Ambika <rohini.ambika@...>
Cc: Anil Shashikumar Belur <abelur@...>; Hsia, Andrew <andrew.hsia@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>; John Mangan <John.Mangan@...>; Sathya Manalan <sathya.manalan@...>; Hemalatha Thangavelu <hemalatha.t@...>; Gokul Sakthivel <gokul.sakthivel@...>; Bhaswati_Das <Bhaswati_Das@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hello Rohini,

 

Thank you for the answers.

  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.

Regards,
Rahul

 

On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


 

--

- Rahul Sharma


Re: [integration-dev] [opendaylight-dev] ODL Clustering issue - High Availability

Rohini Ambika
 

Thanks.

 

We have already tested CONTROLLER-2035 with Phosphorous SR2(created a patch with the fix) and the issue still persists when we do multiple restarts of the master node(approx. after 10th restart).

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: TSC@... <TSC@...> On Behalf Of Daniel de la Rosa
Sent: Thursday, July 28, 2022 9:50 PM
To: Rohini Ambika <rohini.ambika@...>; Venkatrangan Govindarajan <gvrangan@...>
Cc: Ivan Hrasko <ivan.hrasko@...>; integration-dev@...; dev@...; kernel-dev@...; TSC <tsc@...>
Subject: Re: [OpenDaylight TSC] [integration-dev] [opendaylight-dev] ODL Clustering issue - High Availability

 

[**EXTERNAL EMAIL**]

Rohini and all

 

Please use Phosphorus SR3 since CONTROLLER-2035 is fixed in that version. In any case, @Venkatrangan Govindarajan  will also get back to you in case he finds anything in the logs you provided

 

thanks

 

On Wed, Jul 27, 2022 at 5:45 AM Rohini Ambika via lists.opendaylight.org <rohini.ambika=infosys.com@...> wrote:

Hi,

 

ODL version – Phosphorous SR2

 

Thanks & Regards,

Rohini

 

From: dev@... <dev@...> On Behalf Of Ivan Hrasko
Sent: Wednesday, July 27, 2022 5:31 PM
To: integration-dev@...; dev@...; kernel-dev@...; kernel-dev@...
Subject: Re: [opendaylight-dev] ODL Clustering issue - High Availability

 

[**EXTERNAL EMAIL**]

Hello,

 

what is the ODL version please?

 

Best,

 

Ivan Hraško

Senior Software Engineer

 

PANTHEON .tech

Mlynské Nivy 56, 821 05 Bratislava

Slovakia

Tel / +421 220 665 111

 

MAIL / ivan.hrasko@...

WEB / https://pantheon.tech

 


Od: integration-dev@... <integration-dev@...> v mene používateľa Rohini Ambika via lists.opendaylight.org <rohini.ambika=infosys.com@...>
Odoslané: streda, 27. júla 2022 13:19
Komu: integration-dev@...; dev@...; kernel-dev@...; kernel-dev@...
Predmet: [integration-dev] ODL Clustering issue - High Availability

 

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

·         Requirement : ODL clustering for high availability (HA) on data distribution

·         Env Configuration:

o    3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node

o    CPU :  8 Cores

o    RAM : 20GB

o    Java Heap size : Min – 512MB Max – 16GB

o    JDK version : 11

o    Kubernetes version : 1.19.1

o    Docker version : 20.10.7

·         ODL features installed to enable clustering:

o    odl-netconf-clustered-topology

o    odl-restconf-all

·         Device configured : Netconf devices , all devices having same schema(tested with 250 devices)

·         Use Case:

o    Fail Over/High Availability:

§  Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.

§  Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

·         JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035  

·         Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 




Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rohini Ambika
 

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma


Re: [integration-dev] [opendaylight-dev] ODL Clustering issue - High Availability

Daniel de la Rosa
 

Rohini and all

Please use Phosphorus SR3 since CONTROLLER-2035 is fixed in that version. In any case, @Venkatrangan Govindarajan  will also get back to you in case he finds anything in the logs you provided

thanks

On Wed, Jul 27, 2022 at 5:45 AM Rohini Ambika via lists.opendaylight.org <rohini.ambika=infosys.com@...> wrote:

Hi,

 

ODL version – Phosphorous SR2

 

Thanks & Regards,

Rohini

 

From: dev@... <dev@...> On Behalf Of Ivan Hrasko
Sent: Wednesday, July 27, 2022 5:31 PM
To: integration-dev@...; dev@...; kernel-dev@...; kernel-dev@...
Subject: Re: [opendaylight-dev] ODL Clustering issue - High Availability

 

[**EXTERNAL EMAIL**]

Hello,

 

what is the ODL version please?

 

Best,

 

Ivan Hraško

Senior Software Engineer

 

PANTHEON .tech

Mlynské Nivy 56, 821 05 Bratislava

Slovakia

Tel / +421 220 665 111

 

MAIL / ivan.hrasko@...

WEB / https://pantheon.tech

 


Od: integration-dev@... <integration-dev@...> v mene používateľa Rohini Ambika via lists.opendaylight.org <rohini.ambika=infosys.com@...>
Odoslané: streda, 27. júla 2022 13:19
Komu: integration-dev@...; dev@...; kernel-dev@...; kernel-dev@...
Predmet: [integration-dev] ODL Clustering issue - High Availability

 

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

·         Requirement : ODL clustering for high availability (HA) on data distribution

·         Env Configuration:

o    3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node

o    CPU :  8 Cores

o    RAM : 20GB

o    Java Heap size : Min – 512MB Max – 16GB

o    JDK version : 11

o    Kubernetes version : 1.19.1

o    Docker version : 20.10.7

·         ODL features installed to enable clustering:

o    odl-netconf-clustered-topology

o    odl-restconf-all

·         Device configured : Netconf devices , all devices having same schema(tested with 250 devices)

·         Use Case:

o    Fail Over/High Availability:

§  Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.

§  Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

·         JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035  

·         Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 





Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rahul Sharma <rahul.iitr@...>
 

Hello Rohini,

Thank you for the answers.
  1. For the 1st one: when you say you tried with the official Helm charts - which helm charts are you referring to? Can you send more details on how (parameters in values.yaml that you used) when you deployed these charts.
  2. What was the Temporary fix that reduced the occurrence of the issue. Can you point to the check-in made or change in configuration parameters? Would be helpful to diagnose a proper fix.
Regards,
Rahul


On Thu, Jul 28, 2022 at 2:21 AM Rohini Ambika <rohini.ambika@...> wrote:

Hi Anil,

 

Thanks for the response.

 

Please find the details below:

 

1.            Is the Test deployment using our Helm charts (ODL Helm Chart)? –  We have created our own helm chart for the ODL deployment. Have also tried the use case with official helm chart.

2.            I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version. – This was a temporary fix from our end and  the failure rate has reduced due to the fix, however we are still facing the issue when we do multiple restarts of master node.

 

ODL version used is Phosphorous SR2

All the configurations are provided and attached in the initial mail .

 

Thanks & Regards,

Rohini

Cell: +91.9995241298 | VoIP: +91.471.3025332

 

From: Anil Shashikumar Belur <abelur@...>
Sent: Thursday, July 28, 2022 5:05 AM
To: Rahul Sharma <rahul.iitr@...>
Cc: Hsia, Andrew <andrew.hsia@...>; Rohini Ambika <rohini.ambika@...>; Casey Cain <ccain@...>; Luis Gomez <ecelgp@...>; TSC <tsc@...>
Subject: Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...

 

[**EXTERNAL EMAIL**]

Hi, 

 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 

Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?

 

On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:

Hi Anil,

 

Thank you for bringing this up.

 

Couple of questions:

  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

 

Thanks,
Rahul

 

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:

Hi Andrew and Rahul:

 

I remember we have discussed these topics in the ODL containers and helm charts meetings. 

Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

 

Cheers,

Anil 

 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>

 

A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 


 

--

- Rahul Sharma



--
- Rahul Sharma


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Anil Belur
 

Hi, 

I belive they are using ODL Helm charts and K8s for the cluster setup, that said  I have requested the version of ODL being used. 
Rohoni: Can you provide more details on the ODL version, and configuration, that Rahul/Andrew requested?


On Thu, Jul 28, 2022 at 8:08 AM Rahul Sharma <rahul.iitr@...> wrote:
Hi Anil,

Thank you for bringing this up.

Couple of questions:
  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

Thanks,
Rahul

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:
Hi Andrew and Rahul:

I remember we have discussed these topics in the ODL containers and helm charts meetings. 
Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

Cheers,
Anil 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>


A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 



--
- Rahul Sharma


Re: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Rahul Sharma <rahul.iitr@...>
 

Hi Anil,

Thank you for bringing this up.

Couple of questions:
  1. Is the Test deployment using our Helm charts (ODL Helm Chart)?
  2. I see that the JIRA mentioned in the below email ( https://jira.opendaylight.org/browse/CONTROLLER-2035  ) is already marked Resolved. Has somebody fixed it in the latest version.

Thanks,
Rahul


On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:
Hi Andrew and Rahul:

I remember we have discussed these topics in the ODL containers and helm charts meetings. 
Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

Cheers,
Anil 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>


A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 



--
- Rahul Sharma


Re: [E] Fwd: [opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Hsia, Andrew
 

Anil,

I tested the helm chart in k8s deployment but in standalone mode.
I recall Rahul made some modifications to deploy in cluster mode.

On Wed, Jul 27, 2022 at 5:05 PM Anil Shashikumar Belur <abelur@...> wrote:
Hi Andrew and Rahul:

I remember we have discussed these topics in the ODL containers and helm charts meetings. 
Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

Cheers,
Anil 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>


A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 



--
Thanks

Andrew


[opendaylight-dev] Message Approval Needed - rohini.ambika@infosys.com posted to dev@lists.opendaylight.org

Anil Belur
 

Hi Andrew and Rahul:

I remember we have discussed these topics in the ODL containers and helm charts meetings. 
Do we know if the expected configuration would work with the ODL on K8s clusters setup or requires some configuration changes?

Cheers,
Anil 

---------- Forwarded message ---------
From: Group Notification <noreply@...>
Date: Wed, Jul 27, 2022 at 9:04 PM
Subject: [opendaylight-dev] Message Approval Needed - rohini.ambika@... posted to dev@...
To: <odl-mailman-owner@...>


A message was sent to the group https://lists.opendaylight.org/g/dev from rohini.ambika@... that needs to be approved because the user is new member moderated.

View this message online

Subject: FW: ODL Clustering issue - High Availability

Hi All,

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

Details and configurations as follows:


* Requirement : ODL clustering for high availability (HA) on data distribution
* Env Configuration:

* 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
* CPU : 8 Cores
* RAM : 20GB
* Java Heap size : Min - 512MB Max - 16GB
* JDK version : 11
* Kubernetes version : 1.19.1
* Docker version : 20.10.7

* ODL features installed to enable clustering:

* odl-netconf-clustered-topology
* odl-restconf-all

* Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
* Use Case:

* Fail Over/High Availability:

* Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
* Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.

* JIRA reference : https://jira.opendaylight.org/browse/CONTROLLER-2035<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjira.opendaylight.org%2Fbrowse%2FCONTROLLER-2035&data=05%7C01%7Crohini.ambika%40infosys.com%7C12cedda8fd77459df73b08da6fb6802e%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C637945126890707334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6yZbWAhTgVdwHVbpO7UtUenKW5%2B476j%2BG4ZEodjBUKc%3D&reserved=0>
* Akka configuration of all the nodes attached. (Increased the gossip-interval time to 5s in akka.conf file to avoid Akka AskTimedOut issue while mounting multiple devices at a time.)


Requesting your support to identify if there is any mis-configurations or any known solution for the issue .
Please let us know if any further information required.

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.


Thanks & Regards,
Rohini

A complete copy of this message has been attached for your convenience.

To approve this using email, reply to this message. You do not need to attach the original message, just reply and send.

Reject this message and notify the sender.

Delete this message and do not notify the sender.

NOTE: The pending message will expire after 14 days. If you do not take action within that time, the pending message will be automatically rejected.


Change your notification settings




---------- Forwarded message ----------
From: rohini.ambika@...
To: "dev@..." <dev@...>
Cc: 
Bcc: 
Date: Wed, 27 Jul 2022 11:03:22 +0000
Subject: FW: ODL Clustering issue - High Availability

Hi All,

 

As presented/discussed in the ODL TSC meeting held on 22nd Friday 10.30 AM IST, posting this email to highlight the issues on ODL clustering use cases encountered during the performance testing.

 

Details and configurations as follows:

 

  • Requirement : ODL clustering for high availability (HA) on data distribution
  • Env Configuration:
    • 3 node k8s Cluster ( 1 master & 3 worker nodes) with 3 ODL instances running on each node
    • CPU :  8 Cores
    • RAM : 20GB
    • Java Heap size : Min – 512MB Max – 16GB
    • JDK version : 11
    • Kubernetes version : 1.19.1
    • Docker version : 20.10.7
  • ODL features installed to enable clustering:
    • odl-netconf-clustered-topology
    • odl-restconf-all
  • Device configured : Netconf devices , all devices having same schema(tested with 250 devices)
  • Use Case:
    • Fail Over/High Availability:
      • Expected : In case any of the ODL instance gets down/restarted due to network splits or internal error, other instance in cluster should be available and functional. If the affected instance is having master mount, the other instance who is elected as master by re-election should be able to re-register the devices and resume the operations. Once the affected instance comes up, it should be able to join the cluster as member node and register the slave mounts.
      • Observation : When the odl instance which is having the master mount restarts, election happens among the other node in the cluster and elects the new leader. Now the new leader is trying to re-register the master mount but failed at a point due to the termination of the Akka Cluster Singleton Actor. Hence the cluster goes to idle state and failed to assign owner for the device DOM entity. In this case, the configuration of already mounted device/ new mounts will fail.  

 

 

Requesting your support to identify if there is any mis-configurations or any known solution for the issue .

Please let us know if any further information required.

 

Note : We have tested the single ODL instance without enabling cluster features in K8s cluster. In case of K8s node failure, ODL instance will be re-scheduled in other available K8s node and operations will be resumed.  

             

 

Thanks & Regards,

Rohini

 

101 - 120 of 14324