Date
1 - 2 of 2
[kernel-dev] Regression in Mg SR1
JamO Luhrsen
On 5/1/20 1:01 PM, Luis Gomez wrote:
here's the revert:FYI this change merged in April 27th:
https://git.opendaylight.org/gerrit/c/controller/+/89554
This one seems easiest to try to debug. The gist of the problem is this:
Produced regression in all these many suites:
- bring up older controller version and do some configs
- copy snapshots/ and *journal/ folders off to new controller version
- start new controller version
- notice that the data/config is not there (404 on cars:cars)
That's all I have though, by looking at the robot logs. Looking at the karaf log, it's
weirdly silent after the new controller boots up like normal. All that's there are
the two log statements we write to it from robot:
2020-05-01T01:47:57,330 | INFO | pipe-log:log "ROBOT MESSAGE: Starting test controller-akka1.txt.Verify_Data_Is_Restored" | core | 123 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting test controller-akka1.txt.Verify_Data_Is_Restored 2020-05-01T01:51:01,859 | INFO | pipe-log:log "ROBOT MESSAGE: Starting test controller-akka1.txt.Archive_Older_Karaf_Log" | core | 123 - org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting test controller-akka1.txt.Archive_Older_Karaf_Log
The bgp jobs seem to be even more broken though. More ERRORs,
etc. Not sure
if we need to look at those separately or not.
Did snapshot change? I know journal did, but we addressed that here:
These suites are in some degree dealing with the snapshot folder that might have changed after the mentioned patch.
https://git.opendaylight.org/gerrit/c/integration/test/+/88658
Once the revert I created gives me a distribution, I'll run it through these four
I am not sure at this moment we should investigate the issues + repair the test (it can take a while) or just revert and try next SR.
I would guess some folks might have also reservations in introducing this change in an SR.
jobs in the sandbox. If those all pass like expected, and we don't get any quick
fix on the CSIT side, it might make sense to merge the revert and get moving
on a new release candidate.
Thanks,
JamO
BR/Luis
Robert Varga
On 01/05/2020 23:36, JamO Luhrsen wrote:
singleton -- but that may be related (although I am not sure how).
Regards,
Robert
Merged.
On 5/1/20 1:01 PM, Luis Gomez wrote:FYI this change merged in April 27th:here's the revert:
https://git.opendaylight.org/gerrit/c/controller/+/87586
https://git.opendaylight.org/gerrit/c/controller/+/89554
I looked at the BGPCEP job and this part is quite weird:This one seems easiest to try to debug. The gist of the problem is this:
Produced regression in all these many suites:
https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-tell-all-magnesium/
https://jenkins.opendaylight.org/releng/job/controller-csit-3node-clustering-ask-all-magnesium/
https://jenkins.opendaylight.org/releng/job/controller-csit-1node-akka1-all-magnesium/
- bring up older controller version and do some configs
- copy snapshots/ and *journal/ folders off to new controller version
- start new controller version
- notice that the data/config is not there (404 on cars:cars)
That's all I have though, by looking at the robot logs. Looking at the
karaf log, it's
weirdly silent after the new controller boots up like normal. All that's
there are
the two log statements we write to it from robot:
2020-05-01T01:47:57,330 | INFO | pipe-log:log "ROBOT MESSAGE: Starting
test controller-akka1.txt.Verify_Data_Is_Restored" | core | 123 -
org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting test
controller-akka1.txt.Verify_Data_Is_Restored 2020-05-01T01:51:01,859 |
INFO | pipe-log:log "ROBOT MESSAGE: Starting test
controller-akka1.txt.Archive_Older_Karaf_Log" | core | 123 -
org.apache.karaf.log.core - 4.2.6 | ROBOT MESSAGE: Starting test
controller-akka1.txt.Archive_Older_Karaf_Log
2020-05-04T16:10:03,320 | ERROR | opendaylight-cluster-data-shard-dispatcher-39 | Shard | 298 - org.opendaylight.controller.sal-clustering-commons - 1.11.1 | member-1-shard-default-config: Log entry not found for index 0also, similar things are going down in the ask-all job:
2020-05-04T16:10:03,383 | ERROR | opendaylight-cluster-data-shard-dispatcher-39 | Shard | 298 - org.opendaylight.controller.sal-clustering-commons - 1.11.1 | member-1-shard-default-config: failed to apply payload org.opendaylight.controller.cluster.datastore.persisted.CommitTransactionPayload$Simple@3d3565da
2020-05-04T15:29:10,295 | INFO | opendaylight-cluster-data-shard-dispatcher-40 | Shard | 297 - org.opendaylight.controller.sal-clustering-commons - 1.11.1 | member-1-shard-prefix-configuration-shard-config (Follower): The log is not empty but the prevLogIndex 22 was not found in it - lastIndex: 21, snapshotIndex: -1, snapshotTerm: -1so something is definitely off with journal :-/
2020-05-04T15:29:10,296 | INFO | opendaylight-cluster-data-shard-dispatcher-40 | Shard | 297 - org.opendaylight.controller.sal-clustering-commons - 1.11.1 | member-1-shard-prefix-configuration-shard-config (Follower): Follower is out-of-sync so sending negative reply: AppendEntriesReply [term=54, success=false, followerId=member-1-shard-prefix-configuration-shard-config, logLastIndex=21, logLastTerm=5, forceInstallSnapshot=false, needsLeaderAddress=false, payloadVersion=11, raftVersion=4, recipientRaftVersion=3]
The bgp jobs seem to be even more broken though. More ERRORs, etc. Not sureProbably yes, as they seem to indicate an inconsistency in cluster
if we need to look at those separately or not.
singleton -- but that may be related (although I am not sure how).
Regards,
Robert