Silicon SR1 CSIT check


Luis Gomez
 

AFAIK, the only remaining issue is the BGP perf test and this patch can partially take care of it (fixes the test to pass):


BR/Luis

On May 25, 2021, at 8:13 AM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Team, do we have any update on this? just making sure so we can move forward with another RC 

Thanks 

On Tue, May 18, 2021 at 1:20 PM Robert Varga <nite@...> wrote:
On 18/05/2021 21:07, Luis Gomez wrote:
> CSIT cluster jobs use distribution bin/configure_cluster.sh script:
>
> https://git.opendaylight.org/gerrit/gitweb?p=integration/distribution.git;a=blob;f=karaf-scripts/src/main/assembly/bin/configure_cluster.sh;h=d93ac44d2a4af25578acac98b4e0386e8f968fd4;hb=HEAD#l187

Right, and we need this backport:
https://git.opendaylight.org/gerrit/c/integration/distribution/+/96247

The second step is to rehost that script to controller, but I'll need to
figure out how to do that in packaging-friendly way.

> And this script uses controller project files in ${CONTROLLER_DIR}/system/org/opendaylight/controller/sal-clustering-config, according to the link above.
>
> So maybe we just need to update these files in controller project?

Yes, but only for the downing provider -- for the akka.tcp, we'll need
something like
https://git.opendaylight.org/gerrit/c/integration/test/+/95692 (but
handling both Artery and Classic remoting).

Regards,
Robert



Daniel de la Rosa
 

Team, do we have any update on this? just making sure so we can move forward with another RC 

Thanks 

On Tue, May 18, 2021 at 1:20 PM Robert Varga <nite@...> wrote:
On 18/05/2021 21:07, Luis Gomez wrote:
> CSIT cluster jobs use distribution bin/configure_cluster.sh script:
>
> https://git.opendaylight.org/gerrit/gitweb?p=integration/distribution.git;a=blob;f=karaf-scripts/src/main/assembly/bin/configure_cluster.sh;h=d93ac44d2a4af25578acac98b4e0386e8f968fd4;hb=HEAD#l187

Right, and we need this backport:
https://git.opendaylight.org/gerrit/c/integration/distribution/+/96247

The second step is to rehost that script to controller, but I'll need to
figure out how to do that in packaging-friendly way.

> And this script uses controller project files in ${CONTROLLER_DIR}/system/org/opendaylight/controller/sal-clustering-config, according to the link above.
>
> So maybe we just need to update these files in controller project?

Yes, but only for the downing provider -- for the akka.tcp, we'll need
something like
https://git.opendaylight.org/gerrit/c/integration/test/+/95692 (but
handling both Artery and Classic remoting).

Regards,
Robert


Robert Varga
 

On 18/05/2021 21:07, Luis Gomez wrote:
CSIT cluster jobs use distribution bin/configure_cluster.sh script:

https://git.opendaylight.org/gerrit/gitweb?p=integration/distribution.git;a=blob;f=karaf-scripts/src/main/assembly/bin/configure_cluster.sh;h=d93ac44d2a4af25578acac98b4e0386e8f968fd4;hb=HEAD#l187
Right, and we need this backport:
https://git.opendaylight.org/gerrit/c/integration/distribution/+/96247

The second step is to rehost that script to controller, but I'll need to
figure out how to do that in packaging-friendly way.

And this script uses controller project files in ${CONTROLLER_DIR}/system/org/opendaylight/controller/sal-clustering-config, according to the link above.

So maybe we just need to update these files in controller project?
Yes, but only for the downing provider -- for the akka.tcp, we'll need
something like
https://git.opendaylight.org/gerrit/c/integration/test/+/95692 (but
handling both Artery and Classic remoting).

Regards,
Robert


Luis Gomez
 

CSIT cluster jobs use distribution bin/configure_cluster.sh script:

https://git.opendaylight.org/gerrit/gitweb?p=integration/distribution.git;a=blob;f=karaf-scripts/src/main/assembly/bin/configure_cluster.sh;h=d93ac44d2a4af25578acac98b4e0386e8f968fd4;hb=HEAD#l187

And this script uses controller project files in ${CONTROLLER_DIR}/system/org/opendaylight/controller/sal-clustering-config, according to the link above.

So maybe we just need to update these files in controller project?

BR/Luis

On May 17, 2021, at 11:15 PM, Robert Varga <nite@...> wrote:



On 18/05/2021 04:46, Daniel de la Rosa wrote:
Hello TSC and all

Sorry for the delay. Here is the Silicon SR1 CSIT check for AR#288

https://docs.google.com/spreadsheets/d/1zvD4xgiMlWCgg5ZBvxSRONY_LtLtzzNUeuA06-i7ukc/edit?usp=sharing
<https://docs.google.com/spreadsheets/d/1zvD4xgiMlWCgg5ZBvxSRONY_LtLtzzNUeuA06-i7ukc/edit?usp=sharing>

Please review at your earliest convenience
I think we have at least two problems.

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/daexim-csit-3node-clustering-basic-only-silicon/321/console.log.gz
is indicating:

2021-05-17T03:42:38,062 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-14 | ClusterActorRefProvider | 205 - org.opendaylight.controller.repackaged-akka - 3.0.8 | No root guardian at [akka.tcp://opendaylight-cluster-data@...:2550]
java.lang.IllegalArgumentException: Wrong protocol of [akka.tcp://opendaylight-cluster-data@...:2550/], expected [akka]
at akka.remote.RemoteActorRef.<init>(RemoteActorRefProvider.scala:671) ~[bundleFile:?]
at akka.remote.RemoteActorRefProvider.rootGuardianAt(RemoteActorRefProvider.scala:476) ~[bundleFile:?]
This seems to be a problem in int/test, where these:
./csit/libraries/ConfGen.py
./csit/variables/clustering/member_down.json
./tools/clustering/cluster-deployer/deploy.py

are assuming peer addresses are 'akka.tcp' -- and they are not with Artery.


We also have this:

2021-05-17T03:42:38,030 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-14 | Cluster | 205 - org.opendaylight.controller.repackaged-akka - 3.0.8 | Cluster Node [akka://opendaylight-cluster-data@...:2550] - No downing-provider-class configured, manual cluster downing required, see https://doc.akka.io/docs/akka/current/typed/cluster.html#downing
which I still need to track down -- it is either int/dist, int/test or
controller's fault.

Regards,
Robert


Daniel de la Rosa
 

Thanks Robert. Sounds like we have two showstoppers then for Silicon SR1 coming from integration Lets see if @Luis Gomez  has time to check it out later this week

On Mon, May 17, 2021 at 11:15 PM Robert Varga <nite@...> wrote:


On 18/05/2021 04:46, Daniel de la Rosa wrote:
> Hello TSC and all
>
> Sorry for the delay. Here is the Silicon SR1 CSIT check for AR#288
>
> https://docs.google.com/spreadsheets/d/1zvD4xgiMlWCgg5ZBvxSRONY_LtLtzzNUeuA06-i7ukc/edit?usp=sharing
> <https://docs.google.com/spreadsheets/d/1zvD4xgiMlWCgg5ZBvxSRONY_LtLtzzNUeuA06-i7ukc/edit?usp=sharing>
>
> Please review at your earliest convenience

I think we have at least two problems.

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/daexim-csit-3node-clustering-basic-only-silicon/321/console.log.gz
is indicating:

> 2021-05-17T03:42:38,062 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-14 | ClusterActorRefProvider          | 205 - org.opendaylight.controller.repackaged-akka - 3.0.8 | No root guardian at [akka.tcp://opendaylight-cluster-data@...:2550]
> java.lang.IllegalArgumentException: Wrong protocol of [akka.tcp://opendaylight-cluster-data@...:2550/], expected [akka]
>       at akka.remote.RemoteActorRef.<init>(RemoteActorRefProvider.scala:671) ~[bundleFile:?]
>       at akka.remote.RemoteActorRefProvider.rootGuardianAt(RemoteActorRefProvider.scala:476) ~[bundleFile:?]

This seems to be a problem in int/test, where these:
./csit/libraries/ConfGen.py
./csit/variables/clustering/member_down.json
./tools/clustering/cluster-deployer/deploy.py

are assuming peer addresses are 'akka.tcp' -- and they are not with Artery.


We also have this:

> 2021-05-17T03:42:38,030 | INFO  | opendaylight-cluster-data-akka.actor.default-dispatcher-14 | Cluster                          | 205 - org.opendaylight.controller.repackaged-akka - 3.0.8 | Cluster Node [akka://opendaylight-cluster-data@...:2550] - No downing-provider-class configured, manual cluster downing required, see https://doc.akka.io/docs/akka/current/typed/cluster.html#downing

which I still need to track down -- it is either int/dist, int/test or
controller's fault.

Regards,
Robert


Robert Varga
 

On 18/05/2021 04:46, Daniel de la Rosa wrote:
Hello TSC and all

Sorry for the delay. Here is the Silicon SR1 CSIT check for AR#288

https://docs.google.com/spreadsheets/d/1zvD4xgiMlWCgg5ZBvxSRONY_LtLtzzNUeuA06-i7ukc/edit?usp=sharing
<https://docs.google.com/spreadsheets/d/1zvD4xgiMlWCgg5ZBvxSRONY_LtLtzzNUeuA06-i7ukc/edit?usp=sharing>

Please review at your earliest convenience
I think we have at least two problems.

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/daexim-csit-3node-clustering-basic-only-silicon/321/console.log.gz
is indicating:

2021-05-17T03:42:38,062 | ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-14 | ClusterActorRefProvider | 205 - org.opendaylight.controller.repackaged-akka - 3.0.8 | No root guardian at [akka.tcp://opendaylight-cluster-data@...:2550]
java.lang.IllegalArgumentException: Wrong protocol of [akka.tcp://opendaylight-cluster-data@...:2550/], expected [akka]
at akka.remote.RemoteActorRef.<init>(RemoteActorRefProvider.scala:671) ~[bundleFile:?]
at akka.remote.RemoteActorRefProvider.rootGuardianAt(RemoteActorRefProvider.scala:476) ~[bundleFile:?]
This seems to be a problem in int/test, where these:
./csit/libraries/ConfGen.py
./csit/variables/clustering/member_down.json
./tools/clustering/cluster-deployer/deploy.py

are assuming peer addresses are 'akka.tcp' -- and they are not with Artery.


We also have this:

2021-05-17T03:42:38,030 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-14 | Cluster | 205 - org.opendaylight.controller.repackaged-akka - 3.0.8 | Cluster Node [akka://opendaylight-cluster-data@...:2550] - No downing-provider-class configured, manual cluster downing required, see https://doc.akka.io/docs/akka/current/typed/cluster.html#downing
which I still need to track down -- it is either int/dist, int/test or
controller's fault.

Regards,
Robert


Daniel de la Rosa
 

Hello TSC and all

Sorry for the delay. Here is the Silicon SR1 CSIT check for AR#288


Please review at your earliest convenience


Thanks


On Thu, May 13, 2021 at 6:21 AM Robert Varga <nite@...> wrote:
Okay, this now all merged up. Next autorelease should have everything we
need for Silicon SR1.

Sorry about the delays, there were all sorts of shenanigans :(

Regards,
Robert



On 11/05/2021 03:40, Anil Belur wrote:
> I'm ok with postponing the code freeze. It's better to postpone a
> release than do additional releases. ;P
> The permissions have been reverted now. Cheers 
>
> On Tue, May 11, 2021 at 6:28 AM Daniel de la Rosa
> <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:
>
>     Fine with me. So Anil Belur, can you pospone code freeze for two days ? 
>
>     Thanks 
>
>     On Mon, May 10, 2021 at 12:42 PM Robert Varga <nite@...
>     <mailto:nite@...>> wrote:
>
>         On 03/05/2021 04:57, Daniel de la Rosa wrote:
>         > Hells TSC and all
>         >
>         > Correction. Code freeze will be on May 10th at 10 am pst
>
>         Hey everyone,
>
>         unfortunately I missed this deadline, sorry.
>
>         Overall I have all MRI projects and BGPCEP done, with a notable
>         exception of NETCONF.
>
>         This would mean either not picking up anything of the ~20 patches
>         pending review in NETCONF -- and I would prefer not to do that,
>         as they
>         are really things that needs to be backported to Aluminium SR4
>         as well.
>
>         Any objections to a day or two extension to finish this up and
>         roll out
>         the MRI updates in time for Wednesday night's autorelease?
>
>         Thanks,
>         Robert
>
>         >
>         > Thanks
>         >
>         > On Thu, Apr 29, 2021 at 10:30 PM Daniel de la Rosa
>         > <ddelarosa0707@... <mailto:ddelarosa0707@...>
>         <mailto:ddelarosa0707@...
>         <mailto:ddelarosa0707@...>>> wrote:
>         >
>         >     Hello TSC and all
>         >
>         >     We are going to code freeze Silicon for all Managed
>         Projects ( cut
>         >     and lock release branches ) on Thursday May 6th 
>         >
>         >     Please remember that we only allow blocker bug fixes in
>         release
>         >     branch after code freezes
>         >
>         >     Daniel de la Rosa
>         >     ODL Release Manager
>         >
>         >     Thanks
>         >
>         >     ps. Release schedule and checklist for your reference
>         >
>         >   
>          https://wiki.opendaylight.org/display/ODL/Silicon+SR1+Release+Checklist
>         <https://wiki.opendaylight.org/display/ODL/Silicon+SR1+Release+Checklist>
>         >   
>          <https://wiki.opendaylight.org/display/ODL/Silicon+SR1+Release+Checklist
>         <https://wiki.opendaylight.org/display/ODL/Silicon+SR1+Release+Checklist>>
>         >
>         >   
>          https://docs.opendaylight.org/en/latest/release-process/release-schedule.html
>         <https://docs.opendaylight.org/en/latest/release-process/release-schedule.html>
>         >   
>          <https://docs.opendaylight.org/en/latest/release-process/release-schedule.html
>         <https://docs.opendaylight.org/en/latest/release-process/release-schedule.html>>
>         >
>         >
>         >
>         >
>         >
>
>
>
>
>