[OpenDaylight TSC] Magnesium RC


Robert Varga
 

On 14/03/2020 20:05, JamO Luhrsen wrote:

tracking sheet for projects to update:
https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit#gid=1864238939



There is already one issue that I marked as a blocker until I can look more
closely. It's the same problem I reported to Robert on his mdsal multipatch
job, which is totally unrelated to the Magnesium release. I'm worried that
it's a new regression we have.
If it's the OVSDB issue, I have zero clue as to what is going on, except
there is a ton of NPEs:

2020-03-14T07:03:59,899 | WARN | transaction-invoker-impl-0 | OvsdbOperationalCommandAggregator | 301 - org.opendaylight.ovsdb.southbound-impl - 1.10.0 | Exception trying to execute org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand@e285fe0
java.lang.NullPointerException: null
at org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.getTerminationPointBridge(OvsdbPortUpdateCommand.java:257) ~[?:?]
at org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.updateTerminationPoints(OvsdbPortUpdateCommand.java:171) ~[?:?]
at org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.execute(OvsdbPortUpdateCommand.java:126) ~[?:?]
at org.opendaylight.ovsdb.southbound.transactions.md.OvsdbOperationalCommandAggregator.execute(OvsdbOperationalCommandAggregator.java:72) ~[?:?]
at org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.executeCommand(TransactionInvokerImpl.java:129) ~[?:?]
at java.util.ArrayList.forEach(ArrayList.java:1540) [?:?]
at org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.run(TransactionInvokerImpl.java:119) [301:org.opendaylight.ovsdb.southbound-impl:1.10.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
and it seems that verification is getting 401s -- probably because of these:

2020-03-14T07:04:57,040 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-4 | EndpointReader | 47 - com.typesafe.akka.slf4j - 2.5.26 | Discarding inbound message to [Actor[akka://opendaylight-cluster-data/]] in read-only association to [akka.tcp://opendaylight-cluster-data@...:2550]. If this happens often you may consider using akka.remote.use-passive-connections=off or use Artery TCP.
I have no remembrance of why we are using passive connections :(

Regards,
Robert


Robert Varga
 

On 16/03/2020 09:44, Robert Varga wrote:
2020-03-14T07:04:57,040 | WARN | opendaylight-cluster-data-akka.actor.default-dispatcher-4 | EndpointReader | 47 - com.typesafe.akka.slf4j - 2.5.26 | Discarding inbound message to [Actor[akka://opendaylight-cluster-data/]] in read-only association to [akka.tcp://opendaylight-cluster-data@...:2550]. If this happens often you may consider using akka.remote.use-passive-connections=off or use Artery TCP.
I have no remembrance of why we are using passive connections :(
https://git.opendaylight.org/gerrit/c/controller/+/88437 switches them off.

We cannot go to Artery TCP, as it has a nasty mem leak before 2.5.29 --
upgrade to that version is staged in the mri-sodium-sr3, though.

Bye,
Robert


Chetan Arakere Gowdru <chetan.arakere@...>
 

Hi Robert,

We are working on the fix to address this NPE.

@Jamo,

Was this NPE seen with earlier release also(I believe so as that codebase from
which NPE thrown was same with earlier releases also). Also, can you point us
to CSIT JOB popping with this NPE's.

Thanks,
Chetan

-----Original Message-----
From: TSC@... <TSC@...> On Behalf Of
Robert Varga
Sent: 16 March 2020 14:15
To: JamO Luhrsen <jluhrsen@...>; release@...;
integration-dev@...; tsc <TSC@...>
Subject: Re: [OpenDaylight TSC] Magnesium RC

On 14/03/2020 20:05, JamO Luhrsen wrote:

tracking sheet for projects to update:
https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMx
nUgANTcOlYD_U/edit#gid=1864238939



There is already one issue that I marked as a blocker until I can look
more closely. It's the same problem I reported to Robert on his mdsal
multipatch job, which is totally unrelated to the Magnesium release.
I'm worried that it's a new regression we have.
If it's the OVSDB issue, I have zero clue as to what is going on, except there
is a ton of NPEs:

2020-03-14T07:03:59,899 | WARN | transaction-invoker-impl-0 |
OvsdbOperationalCommandAggregator | 301 -
org.opendaylight.ovsdb.southbound-impl - 1.10.0 | Exception trying to
execute
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateComma
nd@e285fe0
java.lang.NullPointerException: null
at
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.getTerminationPointBridge(OvsdbPortUpdateCommand.java:257)
~[?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.updateTerminationPoints(OvsdbPortUpdateCommand.java:171)
~[?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.execute(OvsdbPortUpdateCommand.java:126)
~[?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbOperationalCommandAggregator.execute(OvsdbOperationalCommandAggregator.java:72)
~[?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.executeCommand(TransactionInvokerImpl.java:129)
~[?:?]
at java.util.ArrayList.forEach(ArrayList.java:1540) [?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.run(TransactionInvokerImpl.java:119)
[301:org.opendaylight.ovsdb.southbound-impl:1.10.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
and it seems that verification is getting 401s -- probably because of these:

2020-03-14T07:04:57,040 | WARN |
opendaylight-cluster-data-akka.actor.default-dispatcher-4 | EndpointReader
| 47 - com.typesafe.akka.slf4j - 2.5.26 | Discarding inbound message to
[Actor[akka://opendaylight-cluster-data/]] in read-only association to
[akka.tcp://opendaylight-cluster-data@...:2550]. If this happens
often you may consider using akka.remote.use-passive-connections=off or use
Artery TCP.
I have no remembrance of why we are using passive connections :(

Regards,
Robert


JamO Luhrsen
 

FYI, the problem I've been digging in to is now filed here:
  https://jira.opendaylight.org/browse/AAA-195

It's not ovsdb related, but more low-level and may not even be in AAA. The high
level issue is that bundles are not starting up as expected so basic functionality
(like authenticating a rest call) fails. More info is in the jira.

Although it does seem to be a new issue, since it seems relatively
infrequent (only saw it once in more than 125 tries in the sandbox)
I think it does not need to block the formal Magnesium release. I did
mark it as a blocker for Magnesium SR1. It would be a pretty ugly state
for an end user to hit, if they happened to be unlucky.

Thanks,
JamO

On 3/16/2020 8:32 PM, Chetan Arakere Gowdru wrote:
Hi Robert,

We are working on the fix to address this NPE.

@Jamo,

Was this NPE seen with earlier release also(I believe so as that codebase from
which NPE thrown was same with earlier releases also). Also, can you point us
to CSIT JOB popping with this NPE's.

Thanks,
Chetan
-----Original Message-----
From: TSC@... <TSC@...> On Behalf Of
Robert Varga
Sent: 16 March 2020 14:15
To: JamO Luhrsen <jluhrsen@...>; release@...;
integration-dev@...; tsc <TSC@...>
Subject: Re: [OpenDaylight TSC] Magnesium RC

On 14/03/2020 20:05, JamO Luhrsen wrote:
tracking sheet for projects to update:
https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMx
nUgANTcOlYD_U/edit#gid=1864238939



There is already one issue that I marked as a blocker until I can look
more closely. It's the same problem I reported to Robert on his mdsal
multipatch job, which is totally unrelated to the Magnesium release.
I'm worried that it's a new regression we have.
If it's the OVSDB issue, I have zero clue as to what is going on, except there
is a ton of NPEs:

2020-03-14T07:03:59,899 | WARN | transaction-invoker-impl-0 |
OvsdbOperationalCommandAggregator | 301 -
org.opendaylight.ovsdb.southbound-impl - 1.10.0 | Exception trying to
execute
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateComma
nd@e285fe0
java.lang.NullPointerException: null
at
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.getTerminationPointBridge(OvsdbPortUpdateCommand.java:257)
~[?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.updateTerminationPoints(OvsdbPortUpdateCommand.java:171)
~[?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.execute(OvsdbPortUpdateCommand.java:126)
~[?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.OvsdbOperationalCommandAggregator.execute(OvsdbOperationalCommandAggregator.java:72)
~[?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.executeCommand(TransactionInvokerImpl.java:129)
~[?:?]
at java.util.ArrayList.forEach(ArrayList.java:1540) [?:?]
at
org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.run(TransactionInvokerImpl.java:119)
[301:org.opendaylight.ovsdb.southbound-impl:1.10.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
and it seems that verification is getting 401s -- probably because of these:

2020-03-14T07:04:57,040 | WARN |
opendaylight-cluster-data-akka.actor.default-dispatcher-4 | EndpointReader
| 47 - com.typesafe.akka.slf4j - 2.5.26 | Discarding inbound message to
[Actor[akka://opendaylight-cluster-data/]] in read-only association to
[akka.tcp://opendaylight-cluster-data@...:2550]. If this happens
often you may consider using akka.remote.use-passive-connections=off or use
Artery TCP.
I have no remembrance of why we are using passive connections :(

Regards,
Robert


Daniel de la Rosa
 

Thanks Jamo. I’ll start an email thread on magnesium release now that all issues have been reviewed 

On Tue, Mar 17, 2020 at 4:13 PM JamO Luhrsen <jluhrsen@...> wrote:
FYI, the problem I've been digging in to is now filed here:
   https://jira.opendaylight.org/browse/AAA-195

It's not ovsdb related, but more low-level and may not even be in AAA.
The high
level issue is that bundles are not starting up as expected so basic
functionality
(like authenticating a rest call) fails. More info is in the jira.

Although it does seem to be a new issue, since it seems relatively
infrequent (only saw it once in more than 125 tries in the sandbox)
I think it does not need to block the formal Magnesium release. I did
mark it as a blocker for Magnesium SR1. It would be a pretty ugly state
for an end user to hit, if they happened to be unlucky.

Thanks,
JamO

On 3/16/2020 8:32 PM, Chetan Arakere Gowdru wrote:
> Hi Robert,
>
> We are working on the fix to address this NPE.
>
> @Jamo,
>
> Was this NPE seen with earlier release also(I believe so as that codebase from
> which NPE thrown was same with earlier releases also). Also, can you point us
> to CSIT JOB popping with this NPE's.
>
> Thanks,
> Chetan
> -----Original Message-----
> From: TSC@... <TSC@...> On Behalf Of
> Robert Varga
> Sent: 16 March 2020 14:15
> To: JamO Luhrsen <jluhrsen@...>; release@...;
> integration-dev@...; tsc <TSC@...>
> Subject: Re: [OpenDaylight TSC] Magnesium RC
>
> On 14/03/2020 20:05, JamO Luhrsen wrote:
>> tracking sheet for projects to update:
>> https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMx
>> nUgANTcOlYD_U/edit#gid=1864238939
>>
>>
>>
>> There is already one issue that I marked as a blocker until I can look
>> more closely. It's the same problem I reported to Robert on his mdsal
>> multipatch job, which is totally unrelated to the Magnesium release.
>> I'm worried that it's a new regression we have.
> If it's the OVSDB issue, I have zero clue as to what is going on, except there
> is a ton of NPEs:
>
>> 2020-03-14T07:03:59,899 | WARN  | transaction-invoker-impl-0 |
>> OvsdbOperationalCommandAggregator | 301 -
>> org.opendaylight.ovsdb.southbound-impl - 1.10.0 | Exception trying to
>> execute
>> org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateComma
>> nd@e285fe0
>> java.lang.NullPointerException: null
>>      at
>> org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.getTerminationPointBridge(OvsdbPortUpdateCommand.java:257)
>> ~[?:?]
>>      at
>> org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.updateTerminationPoints(OvsdbPortUpdateCommand.java:171)
>> ~[?:?]
>>      at
>> org.opendaylight.ovsdb.southbound.transactions.md.OvsdbPortUpdateCommand.execute(OvsdbPortUpdateCommand.java:126)
>> ~[?:?]
>>      at
>> org.opendaylight.ovsdb.southbound.transactions.md.OvsdbOperationalCommandAggregator.execute(OvsdbOperationalCommandAggregator.java:72)
>> ~[?:?]
>>      at
>> org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.executeCommand(TransactionInvokerImpl.java:129)
>> ~[?:?]
>>      at java.util.ArrayList.forEach(ArrayList.java:1540) [?:?]
>>      at
>> org.opendaylight.ovsdb.southbound.transactions.md.TransactionInvokerImpl.run(TransactionInvokerImpl.java:119)
>> [301:org.opendaylight.ovsdb.southbound-impl:1.10.0]
>>      at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>> [?:?]
>>      at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>> [?:?]
>>      at java.lang.Thread.run(Thread.java:834) [?:?]
> and it seems that verification is getting 401s -- probably because of these:
>
>> 2020-03-14T07:04:57,040 | WARN  |
>> opendaylight-cluster-data-akka.actor.default-dispatcher-4 | EndpointReader
>> | 47 - com.typesafe.akka.slf4j - 2.5.26 | Discarding inbound message to
>> [Actor[akka://opendaylight-cluster-data/]] in read-only association to
>> [akka.tcp://opendaylight-cluster-data@...:2550]. If this happens
>> often you may consider using akka.remote.use-passive-connections=off or use
>> Artery TCP.
> I have no remembrance of why we are using passive connections :(
>
> Regards,
> Robert
>

--
Daniel de la Rosa
Customer Support Manager
Lumina Networks Inc.
e: ddelarosa@...
m:  +1 408 7728120