Date
1 - 16 of 16
MRI bump for Neon SR3
Robert Varga
Hello everyone,
it's been almost four months since we have had a drop of MRI fixes into
stable/neon. Not that there weren't any, there have been intermediate
releases, but there was no time to integrate them.
Given where we are in other release trains (Sodium: frozen for SR1,
Magnesium: stabilizing CSIT), now is a good time to refresh things.
Overall, this bump brings in significant fixes, including
security-related bumps of upstream artifacts.
There are two breaking changes here:
1) mina-sshd is bumped to 2.3.0, which requires a minor update to netconf
2) YANGTOOLS-857 means yangtools now detects previously-glossed-over
errors in YANG usage. This affects OFP and BGPCEP. OFP has the
corresponding patch here:
https://git.opendaylight.org/gerrit/c/openflowplugin/+/85272. BGPCEP has
two patches, both of which are already merged.
The changes in this bump will be mirrored to stable/sodium (once SR1 is
out) and magnesium (once we have reasonably-stable CSIT).
I have prepared patches here:
https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3
Successful multipatch build is here:
https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/195/
CSIT multipatch buil is (still running) here:
https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/197/
Unless there are objections (or unexpected CSIT failures), I would like
to merge these changes on Friday.
Regards,
Robert
it's been almost four months since we have had a drop of MRI fixes into
stable/neon. Not that there weren't any, there have been intermediate
releases, but there was no time to integrate them.
Given where we are in other release trains (Sodium: frozen for SR1,
Magnesium: stabilizing CSIT), now is a good time to refresh things.
Overall, this bump brings in significant fixes, including
security-related bumps of upstream artifacts.
There are two breaking changes here:
1) mina-sshd is bumped to 2.3.0, which requires a minor update to netconf
2) YANGTOOLS-857 means yangtools now detects previously-glossed-over
errors in YANG usage. This affects OFP and BGPCEP. OFP has the
corresponding patch here:
https://git.opendaylight.org/gerrit/c/openflowplugin/+/85272. BGPCEP has
two patches, both of which are already merged.
The changes in this bump will be mirrored to stable/sodium (once SR1 is
out) and magnesium (once we have reasonably-stable CSIT).
I have prepared patches here:
https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3
Successful multipatch build is here:
https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/195/
CSIT multipatch buil is (still running) here:
https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/197/
Unless there are objections (or unexpected CSIT failures), I would like
to merge these changes on Friday.
Regards,
Robert
Hello Robert,
I see that all changes have been merged so anything else is needed before we release Neon SR3?
Thanks
On Wed, Oct 23, 2019 at 12:48 PM Robert Varga <nite@...> wrote:
Hello everyone,
it's been almost four months since we have had a drop of MRI fixes into
stable/neon. Not that there weren't any, there have been intermediate
releases, but there was no time to integrate them.
Given where we are in other release trains (Sodium: frozen for SR1,
Magnesium: stabilizing CSIT), now is a good time to refresh things.
Overall, this bump brings in significant fixes, including
security-related bumps of upstream artifacts.
There are two breaking changes here:
1) mina-sshd is bumped to 2.3.0, which requires a minor update to netconf
2) YANGTOOLS-857 means yangtools now detects previously-glossed-over
errors in YANG usage. This affects OFP and BGPCEP. OFP has the
corresponding patch here:
https://git.opendaylight.org/gerrit/c/openflowplugin/+/85272. BGPCEP has
two patches, both of which are already merged.
The changes in this bump will be mirrored to stable/sodium (once SR1 is
out) and magnesium (once we have reasonably-stable CSIT).
I have prepared patches here:
https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3
Successful multipatch build is here:
https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/195/
CSIT multipatch buil is (still running) here:
https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/197/
Unless there are objections (or unexpected CSIT failures), I would like
to merge these changes on Friday.
Regards,
Robert
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#12125): https://lists.opendaylight.org/g/TSC/message/12125
Mute This Topic: https://lists.opendaylight.org/mt/36760987/1964961
Group Owner: TSC+owner@...
Unsubscribe: https://lists.opendaylight.org/g/TSC/unsub [ddelarosa@...]
-=-=-=-=-=-=-=-=-=-=-=-
--
Robert Varga
On 14/11/2019 19:04, Daniel De La Rosa wrote:
I am tracking the following:
- need to integrate MRI projects again due to fixes in yangtools (at the
very least). I am currently nailing down the long-standing Karaf upgrade
issue, which should be done around Tuesday
- controller/netconf need to be audited for potential backports
And that's it from my side -- i.e. we should be ready in about a week.
Regards,
Robert
Hello Robert,Hey Daniel,
I see that all changes have been merged so anything else is needed
before we release Neon SR3?
I am tracking the following:
- need to integrate MRI projects again due to fixes in yangtools (at the
very least). I am currently nailing down the long-standing Karaf upgrade
issue, which should be done around Tuesday
- controller/netconf need to be audited for potential backports
And that's it from my side -- i.e. we should be ready in about a week.
Regards,
Robert
Robert Varga
On 16/11/2019 07:36, Robert Varga wrote:
updated with the new version bumps.
Regards,
Robert
On 14/11/2019 19:04, Daniel De La Rosa wrote:https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3 has beenHello Robert,Hey Daniel,
I see that all changes have been merged so anything else is needed
before we release Neon SR3?
I am tracking the following:
- need to integrate MRI projects again due to fixes in yangtools (at the
very least). I am currently nailing down the long-standing Karaf upgrade
issue, which should be done around Tuesday
updated with the new version bumps.
- controller/netconf need to be audited for potential backportsI think this is done.
Regards,
Robert
Robert Varga
On 21/11/2019 11:24, Robert Varga wrote:
Regards,
Robert
On 16/11/2019 07:36, Robert Varga wrote:This is being rolled out now.On 14/11/2019 19:04, Daniel De La Rosa wrote:https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3 has beenHello Robert,Hey Daniel,
I see that all changes have been merged so anything else is needed
before we release Neon SR3?
I am tracking the following:
- need to integrate MRI projects again due to fixes in yangtools (at the
very least). I am currently nailing down the long-standing Karaf upgrade
issue, which should be done around Tuesday
updated with the new version bumps.
Regards,
Robert
Robert Varga
On 21/11/2019 19:56, Robert Varga wrote:
Jamo, Luis: AR #312 should contain all code as of now.
Regards,
Robert
On 21/11/2019 11:24, Robert Varga wrote:All done, the dust should settle in about 45 minutes.On 16/11/2019 07:36, Robert Varga wrote:This is being rolled out now.On 14/11/2019 19:04, Daniel De La Rosa wrote:https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3 has beenHello Robert,Hey Daniel,
I see that all changes have been merged so anything else is needed
before we release Neon SR3?
I am tracking the following:
- need to integrate MRI projects again due to fixes in yangtools (at the
very least). I am currently nailing down the long-standing Karaf upgrade
issue, which should be done around Tuesday
updated with the new version bumps.
Jamo, Luis: AR #312 should contain all code as of now.
Regards,
Robert
JamO Luhrsen
On 11/21/19 2:50 PM, Robert Varga wrote:
https://jenkins.opendaylight.org/releng/job/integration-distribution-test-neon/435/
it doesn't look alarming from the top level. lot's of 100% passing jobs, but we'll
need to dig in to the yellow jobs with failures to make sure we don't have any
regressions.
Probably makes sense to make sure all projects are done with their final bits, lock
the branch and take the next autorelease and start vetting the csit jobs.
JamO
On 21/11/2019 19:56, Robert Varga wrote:and this is the dist-test job that ran with #312.On 21/11/2019 11:24, Robert Varga wrote:All done, the dust should settle in about 45 minutes.On 16/11/2019 07:36, Robert Varga wrote:This is being rolled out now.On 14/11/2019 19:04, Daniel De La Rosa wrote:https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3 has beenHello Robert,Hey Daniel,
I see that all changes have been merged so anything else is needed
before we release Neon SR3?
I am tracking the following:
- need to integrate MRI projects again due to fixes in yangtools (at the
very least). I am currently nailing down the long-standing Karaf upgrade
issue, which should be done around Tuesday
updated with the new version bumps.
Jamo, Luis: AR #312 should contain all code as of now.
https://jenkins.opendaylight.org/releng/job/integration-distribution-test-neon/435/
it doesn't look alarming from the top level. lot's of 100% passing jobs, but we'll
need to dig in to the yellow jobs with failures to make sure we don't have any
regressions.
Probably makes sense to make sure all projects are done with their final bits, lock
the branch and take the next autorelease and start vetting the csit jobs.
JamO
Regards,
Robert
Ok let me add the email lists from all the managed projects as a reminder that Neon SR3 and its corresponding patches require your attention
Thanks
On Fri, Nov 22, 2019 at 10:05 AM Jamo Luhrsen <jluhrsen@...> wrote:
On 11/21/19 2:50 PM, Robert Varga wrote:
> On 21/11/2019 19:56, Robert Varga wrote:
>> On 21/11/2019 11:24, Robert Varga wrote:
>>> On 16/11/2019 07:36, Robert Varga wrote:
>>>> On 14/11/2019 19:04, Daniel De La Rosa wrote:
>>>>> Hello Robert,
>>>>>
>>>>> I see that all changes have been merged so anything else is needed
>>>>> before we release Neon SR3?
>>>>>
>>>> Hey Daniel,
>>>>
>>>> I am tracking the following:
>>>> - need to integrate MRI projects again due to fixes in yangtools (at the
>>>> very least). I am currently nailing down the long-standing Karaf upgrade
>>>> issue, which should be done around Tuesday
>>> https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3 has been
>>> updated with the new version bumps.
>> This is being rolled out now.
> All done, the dust should settle in about 45 minutes.
>
> Jamo, Luis: AR #312 should contain all code as of now.
and this is the dist-test job that ran with #312.
https://jenkins.opendaylight.org/releng/job/integration-distribution-test-neon/435/
it doesn't look alarming from the top level. lot's of 100% passing jobs, but we'll
need to dig in to the yellow jobs with failures to make sure we don't have any
regressions.
Probably makes sense to make sure all projects are done with their final bits, lock
the branch and take the next autorelease and start vetting the csit jobs.
JamO
> Regards,
> Robert
>
--
Daniel de la Rosa
Customer Support Manager ( ODL Release manager )
Lumina Networks Inc.
e: ddelarosa@...
m: +1 408 7728120
Customer Support Manager ( ODL Release manager )
Lumina Networks Inc.
e: ddelarosa@...
m: +1 408 7728120
Luis Gomez
I am not sure if anyone realized but it seems there is some issue stopping nodes in cluster after last MRI update:
toggle quoted message
Show quoted text
BR/Luis
On Nov 22, 2019, at 10:05 AM, Jamo Luhrsen <jluhrsen@...> wrote:
On 11/21/19 2:50 PM, Robert Varga wrote:On 21/11/2019 19:56, Robert Varga wrote:On 21/11/2019 11:24, Robert Varga wrote:All done, the dust should settle in about 45 minutes.On 16/11/2019 07:36, Robert Varga wrote:This is being rolled out now.On 14/11/2019 19:04, Daniel De La Rosa wrote:https://git.opendaylight.org/gerrit/q/topic:mri-neon-sr3 has beenHello Robert,Hey Daniel,
I see that all changes have been merged so anything else is needed
before we release Neon SR3?
I am tracking the following:
- need to integrate MRI projects again due to fixes in yangtools (at the
very least). I am currently nailing down the long-standing Karaf upgrade
issue, which should be done around Tuesday
updated with the new version bumps.
Jamo, Luis: AR #312 should contain all code as of now.
and this is the dist-test job that ran with #312.
https://jenkins.opendaylight.org/releng/job/integration-distribution-test-neon/435/
it doesn't look alarming from the top level. lot's of 100% passing jobs, but we'll
need to dig in to the yellow jobs with failures to make sure we don't have any
regressions.
Probably makes sense to make sure all projects are done with their final bits, lock
the branch and take the next autorelease and start vetting the csit jobs.
JamORegards,
Robert
Robert Varga
On 04/12/2019 18:41, Luis Gomez wrote:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/robot-plugin/log.html.gz#s1-s13-t1-k1-k7-k1-k1-k1-k2
Is that ${member_ip} = 10.30.170.81?
Thanks,
Robert
I am not sure if anyone realized but it seems there is some issueLooks like stopping Karaf is failing:
stopping nodes in cluster after last MRI update:
https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-neon/
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/robot-plugin/log.html.gz#s1-s13-t1-k1-k7-k1-k1-k1-k2
Is that ${member_ip} = 10.30.170.81?
Thanks,
Robert
JamO Luhrsen
On 12/4/19 10:11 AM, Robert Varga
wrote:
On 04/12/2019 18:41, Luis Gomez wrote:I am not sure if anyone realized but it seems there is some issue stopping nodes in cluster after last MRI update: https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-neon/Looks like stopping Karaf is failing: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/robot-plugin/log.html.gz#s1-s13-t1-k1-k7-k1-k1-k1-k2 Is that ${member_ip} = 10.30.170.81?
I think it took like 30m to finally stop. here's the karaf.log:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/odl_1/odl1_karaf.log.gz
right around 2019-12-04T16:20:59,010 is when that test case tries to stop it,
then a ton of ERRORs in the log and finally at this timestamp is when we see
karaf starting back.
Dec 04, 2019 4:50:26 PM org.apache.karaf.main.Main launch
JamO
Thanks, Robert
Luis Gomez
On Dec 4, 2019, at 10:19 AM, Jamo Luhrsen <jluhrsen@...> wrote:The stop started before:
On 12/4/19 10:11 AM, Robert Varga wrote:On 04/12/2019 18:41, Luis Gomez wrote:I think it took like 30m to finally stop. here's the karaf.log:I am not sure if anyone realized but it seems there is some issueLooks like stopping Karaf is failing:
stopping nodes in cluster after last MRI update:
https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-neon/
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/robot-plugin/log.html.gz#s1-s13-t1-k1-k7-k1-k1-k1-k2
Is that ${member_ip} = 10.30.170.81?
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/odl_1/odl1_karaf.log.gz
right around 2019-12-04T16:20:59,010 is when that test case tries to stop it,
then a ton of ERRORs in the log and finally at this timestamp is when we see
karaf starting back.
2019-12-04T16:15:57,404 | INFO | Karaf Shutdown Socket Thread | ShutdownSocketThread | - - | Karaf shutdown socket: received shutdown command. Stopping framework...
And it seems like the controller was not fully booted up when stop was requested. See previous message:
2019-12-04T16:15:57,345 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 201s, remaining time 98s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
It would be nice to know which bundle Karaf was waiting for… (this jobs installs all compatible features).
Dec 04, 2019 4:50:26 PM org.apache.karaf.main.Main launch
JamOThanks,
Robert
JamO Luhrsen
On 12/4/19 10:49 AM, Luis Gomez wrote:
the call to bin/stop:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/robot-plugin/log.html.gz#s1-s13-t1-k1-k3-k2-k1-k1-k2-k2-k1-k1-k1-k6
JamO
yeah, you're right. I was looking at the timestamp when it gave up checking if it was stopped. here'sOn Dec 4, 2019, at 10:19 AM, Jamo Luhrsen <jluhrsen@...> wrote:The stop started before:
On 12/4/19 10:11 AM, Robert Varga wrote:On 04/12/2019 18:41, Luis Gomez wrote:I think it took like 30m to finally stop. here's the karaf.log:I am not sure if anyone realized but it seems there is some issueLooks like stopping Karaf is failing:
stopping nodes in cluster after last MRI update:
https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-neon/
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/robot-plugin/log.html.gz#s1-s13-t1-k1-k7-k1-k1-k1-k2
Is that ${member_ip} = 10.30.170.81?
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/odl_1/odl1_karaf.log.gz
right around 2019-12-04T16:20:59,010 is when that test case tries to stop it,
then a ton of ERRORs in the log and finally at this timestamp is when we see
karaf starting back.
2019-12-04T16:15:57,404 | INFO | Karaf Shutdown Socket Thread | ShutdownSocketThread | - - | Karaf shutdown socket: received shutdown command. Stopping framework...
the call to bin/stop:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-ask-all-neon/415/robot-plugin/log.html.gz#s1-s13-t1-k1-k3-k2-k1-k1-k2-k2-k1-k1-k1-k6
And it seems like the controller was not fully booted up when stop was requested. See previous message:hmmm, this could be the right tree to bark up.
2019-12-04T16:15:57,345 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 201s, remaining time 98s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
It would be nice to know which bundle Karaf was waiting for… (this jobs installs all compatible features).
JamO
Dec 04, 2019 4:50:26 PM org.apache.karaf.main.Main launch
JamOThanks,
Robert
Robert Varga
On 04/12/2019 19:49, Luis Gomez wrote:
here:
terminated. It is a pity, too, because the system just became operational:
ovsdb.southbound-impl was:
can we actually wait for all nodes to report diag before starting
shooting them down?
Regards,
Robert
2019-12-04T16:15:57,345 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 201s, remaining time 98s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}Actually, we do, but let's first take a look, this instance was started
It would be nice to know which bundle Karaf was waiting for… (this jobs installs all compatible features).
here:
Dec 04, 2019 4:11:04 PM org.apache.karaf.main.Main launchso it was given less than five minutes to boot up before it was
INFO: Installing and starting initial bundles
terminated. It is a pity, too, because the system just became operational:
2019-12-04T16:15:41,253 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 185s, remaining time 114s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}[...]
2019-12-04T16:15:41,318 | INFO | opendaylight-cluster-data-shard-dispatcher-31 | ShardManager | 307 - org.opendaylight.controller.sal-distributed-datastore - 1.9.3 | shard-manager-operational: All Shards are ready - data store operational is ready, available count is 0[...]
2019-12-04T16:15:42,259 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 186s, remaining time 113s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}This node is still booting from the previous test suite:
2019-12-04T16:10:15,069 | INFO | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/dom_data_broker/restart_odl_with_tell_based_false.robot" | core | 128 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/dom_data_broker/restart_odl_with_tell_based_false.robot[...]
2019-12-04T16:10:15,286 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 116s, remaining time 183s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
2019-12-04T16:10:16,292 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 117s, remaining time 182s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
2019-12-04T16:10:16,970 | INFO | Karaf Shutdown Socket Thread | ShutdownSocketThread | - - | Karaf shutdown socket: received shutdown command. Stopping framework...
2019-12-04T16:13:06,218 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 30s, remaining time 269s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}and the printout about bundle diag follows then:
2019-12-04T16:13:06,254 | INFO | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/cluster_singleton/partition_and_heal.robot" | core | 128 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/cluster_singleton/partition_and_heal.robot
2019-12-04T16:13:07,109 | INFO | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/cluster_singleton/partition_and_heal.robot" | core | 128 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting test controller-clustering-ask.txt.Partition And Heal.Register_Singleton_Constant_On_Each_Node
2019-12-04T16:15:58,499 | ERROR | SystemReadyService-0 | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | Failed, some bundles did not start (SystemReadyListeners are not called)So it is related to OVSDB. Specifically, the last message from
org.opendaylight.odlparent.bundlestest.lib.SystemStateFailureException: diag failed; some bundles failed to start
diag: Stopping {Installed=0, Resolved=12, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=508, Stopping=2, Failure=0}
1. NOK org.eclipse.osgi:3.12.100.v20180210-1608: OSGi state = Stopping, Karaf bundleState = Stopping, due to: Declarative Services
2. NOK org.apache.karaf.decanter.collector.jmx:1.1.0: OSGi state = Active, Karaf bundleState = Waiting, due to: Declarative Services
org.apache.karaf.decanter.collector.jmx (5)
missing references: EventAdmin
org.apache.karaf.decanter.collector.jmx (6)
missing references: EventAdmin
3. NOK org.opendaylight.ovsdb.southbound-impl:1.8.3: OSGi state = Active, Karaf bundleState = Stopping, due to: Declarative Services
at org.opendaylight.odlparent.bundlestest.lib.TestBundleDiag.checkBundleDiagInfos(TestBundleDiag.java:82) ~[326:org.opendaylight.infrautils.ready-impl:1.5.3]
at org.opendaylight.infrautils.ready.karaf.internal.KarafSystemReady.run(KarafSystemReady.java:83) [326:org.opendaylight.infrautils.ready-impl:1.5.3]
at java.lang.Thread.run(Thread.java:748) [?:?]
ovsdb.southbound-impl was:
2019-12-04T16:15:41,321 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-18 | SouthboundProvider | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | *This* instance of OVSDB southbound provider is set as a SLAVE instancebut I do not see the equivalent of the following normal boot:
2019-12-04T15:50:59,596 | INFO | Blueprint Extender: 3 | OvsdbDataTreeChangeListener | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | OVSDB topology listener has been registered.Also, in "Restart Odl With Tell Based False" or the previous test case,
2019-12-04T15:50:59,597 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-38 | SouthboundProvider | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | *This* instance of OVSDB southbound provider is set as a SLAVE instance
2019-12-04T15:50:59,603 | INFO | Blueprint Extender: 3 | BlueprintContainerImpl | 83 - org.apache.aries.blueprint.core - 1.10.2 | Blueprint bundle org.opendaylight.ovsdb.southbound-impl/1.8.3 has been started
2019-12-04T15:50:59,607 | INFO | opendaylight-cluster-data-notification-dispatcher-61 | SouthboundProvider | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | Starting the ovsdb port
2019-12-04T15:50:59,608 | INFO | opendaylight-cluster-data-notification-dispatcher-61 | OvsdbConnectionService | 435 - org.opendaylight.ovsdb.library - 1.8.3 | registerConnectionListener: registering OvsdbConnectionManager
2019-12-04T15:50:59,608 | INFO | opendaylight-cluster-data-notification-dispatcher-61 | SouthboundProvider | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | Registering deferred system ready listener to start OVSDB Manager later
can we actually wait for all nodes to report diag before starting
shooting them down?
Regards,
Robert
Luis Gomez
That is what I tried with this patch:
toggle quoted message
Show quoted text
And this run:
You can see in the console the problem is in OVSDB.
BR/Luis
On Dec 4, 2019, at 11:40 AM, Robert Varga <nite@...> wrote:On 04/12/2019 19:49, Luis Gomez wrote:2019-12-04T16:15:57,345 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 201s, remaining time 98s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
It would be nice to know which bundle Karaf was waiting for… (this jobs installs all compatible features).
Actually, we do, but let's first take a look, this instance was started
here:Dec 04, 2019 4:11:04 PM org.apache.karaf.main.Main launch
INFO: Installing and starting initial bundles
so it was given less than five minutes to boot up before it was
terminated. It is a pity, too, because the system just became operational:2019-12-04T16:15:41,253 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 185s, remaining time 114s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}[...]2019-12-04T16:15:41,318 | INFO | opendaylight-cluster-data-shard-dispatcher-31 | ShardManager | 307 - org.opendaylight.controller.sal-distributed-datastore - 1.9.3 | shard-manager-operational: All Shards are ready - data store operational is ready, available count is 0[...]2019-12-04T16:15:42,259 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 186s, remaining time 113s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
This node is still booting from the previous test suite:2019-12-04T16:10:15,069 | INFO | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/dom_data_broker/restart_odl_with_tell_based_false.robot" | core | 128 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/dom_data_broker/restart_odl_with_tell_based_false.robot
2019-12-04T16:10:15,286 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 116s, remaining time 183s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
2019-12-04T16:10:16,292 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 117s, remaining time 182s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
2019-12-04T16:10:16,970 | INFO | Karaf Shutdown Socket Thread | ShutdownSocketThread | - - | Karaf shutdown socket: received shutdown command. Stopping framework...
[...]2019-12-04T16:13:06,218 | INFO | awaitility[checkBundleDiagInfos] | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | checkBundleDiagInfos: Elapsed time 30s, remaining time 269s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=515, Stopping=0, Failure=0}
2019-12-04T16:13:06,254 | INFO | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/cluster_singleton/partition_and_heal.robot" | core | 128 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/cluster_singleton/partition_and_heal.robot
2019-12-04T16:13:07,109 | INFO | pipe-log:log "ROBOT MESSAGE: Starting suite /w/workspace/controller-csit-3node-clustering-ask-all-neon/test/csit/suites/controller/cluster_singleton/partition_and_heal.robot" | core | 128 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting test controller-clustering-ask.txt.Partition And Heal.Register_Singleton_Constant_On_Each_Node
and the printout about bundle diag follows then:2019-12-04T16:15:58,499 | ERROR | SystemReadyService-0 | KarafSystemReady | 326 - org.opendaylight.infrautils.ready-impl - 1.5.3 | Failed, some bundles did not start (SystemReadyListeners are not called)
org.opendaylight.odlparent.bundlestest.lib.SystemStateFailureException: diag failed; some bundles failed to start
diag: Stopping {Installed=0, Resolved=12, Unknown=0, GracePeriod=0, Waiting=1, Starting=0, Active=508, Stopping=2, Failure=0}
1. NOK org.eclipse.osgi:3.12.100.v20180210-1608: OSGi state = Stopping, Karaf bundleState = Stopping, due to: Declarative Services
2. NOK org.apache.karaf.decanter.collector.jmx:1.1.0: OSGi state = Active, Karaf bundleState = Waiting, due to: Declarative Services
org.apache.karaf.decanter.collector.jmx (5)
missing references: EventAdmin
org.apache.karaf.decanter.collector.jmx (6)
missing references: EventAdmin
3. NOK org.opendaylight.ovsdb.southbound-impl:1.8.3: OSGi state = Active, Karaf bundleState = Stopping, due to: Declarative Services
at org.opendaylight.odlparent.bundlestest.lib.TestBundleDiag.checkBundleDiagInfos(TestBundleDiag.java:82) ~[326:org.opendaylight.infrautils.ready-impl:1.5.3]
at org.opendaylight.infrautils.ready.karaf.internal.KarafSystemReady.run(KarafSystemReady.java:83) [326:org.opendaylight.infrautils.ready-impl:1.5.3]
at java.lang.Thread.run(Thread.java:748) [?:?]
So it is related to OVSDB. Specifically, the last message from
ovsdb.southbound-impl was:2019-12-04T16:15:41,321 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-18 | SouthboundProvider | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | *This* instance of OVSDB southbound provider is set as a SLAVE instance
but I do not see the equivalent of the following normal boot:2019-12-04T15:50:59,596 | INFO | Blueprint Extender: 3 | OvsdbDataTreeChangeListener | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | OVSDB topology listener has been registered.Also, in "Restart Odl With Tell Based False" or the previous test case,
2019-12-04T15:50:59,597 | INFO | opendaylight-cluster-data-akka.actor.default-dispatcher-38 | SouthboundProvider | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | *This* instance of OVSDB southbound provider is set as a SLAVE instance
2019-12-04T15:50:59,603 | INFO | Blueprint Extender: 3 | BlueprintContainerImpl | 83 - org.apache.aries.blueprint.core - 1.10.2 | Blueprint bundle org.opendaylight.ovsdb.southbound-impl/1.8.3 has been started
2019-12-04T15:50:59,607 | INFO | opendaylight-cluster-data-notification-dispatcher-61 | SouthboundProvider | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | Starting the ovsdb port
2019-12-04T15:50:59,608 | INFO | opendaylight-cluster-data-notification-dispatcher-61 | OvsdbConnectionService | 435 - org.opendaylight.ovsdb.library - 1.8.3 | registerConnectionListener: registering OvsdbConnectionManager
2019-12-04T15:50:59,608 | INFO | opendaylight-cluster-data-notification-dispatcher-61 | SouthboundProvider | 439 - org.opendaylight.ovsdb.southbound-impl - 1.8.3 | Registering deferred system ready listener to start OVSDB Manager later
can we actually wait for all nodes to report diag before starting
shooting them down?
Regards,
Robert
Robert Varga
On 04/12/2019 21:10, Luis Gomez wrote:
anything crops up?
Thanks,
Robert
That is what I tried with this patch:Can you try running it with org.opendaylight.ovsdb=DEBUG to see if
https://git.opendaylight.org/gerrit/#/c/integration/test/+/86190/
And this run:
https://jenkins.opendaylight.org/releng/view/controller/job/controller-csit-3node-clustering-ask-all-neon/417/console
You can see in the console the problem is in OVSDB.
anything crops up?
Thanks,
Robert