Re: [controller-dev] many logs about log index out-of-sync


Tom Pantelis <tompantelis@...>
 



On Tue, Jan 8, 2019 at 7:02 PM Jamo Luhrsen <jluhrsen@...> wrote:


On 1/8/19 3:52 PM, Tom Pantelis wrote:
>
>
> On Tue, Jan 8, 2019 at 5:40 PM Jamo Luhrsen <jluhrsen@... <mailto:jluhrsen@...>> wrote:
>
>     I am debugging a 3node (cluster) netvirt csit job and one of my controllers
>     has a bunch of the messages like this:
>
>     2019-01-03T03:04:38,335 | INFO  | opendaylight-cluster-data-shard-dispatcher-21 | Shard                            |
>     229
>     - org.opendaylight.controller.sal-clustering-commons - 1.8.2 | member-3-shard-default-config (Follower): The log is not
>     empty but the prevLogIndex 19047 was not found in it - lastIndex: 17875, snapshotIndex: -1
>
>     2019-01-03T03:04:38,335 | INFO  | opendaylight-cluster-data-shard-dispatcher-21 | Shard                            |
>     229
>     - org.opendaylight.controller.sal-clustering-commons - 1.8.2 | member-3-shard-default-config (Follower): Follower is
>     out-of-sync so sending negative reply: AppendEntriesReply [term=23, success=false,
>     followerId=member-3-shard-default-config, logLastIndex=17875, logLastTerm=4, forceInstallSnapshot=false,
>     payloadVersion=9, raftVersion=3]
>
>     log:
>     https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-3node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-fluorine/168/odl_3/odl3_karaf.log.gz
>
>     The job has lots of csit failures, so I know something is broken somewhere.
>     I don't know if the above has anything to do with it or not. Maybe it's
>     even expected in this scenario.
>
>
> Those messages can happen if a follower gets behind the leader, especially if it gets isolated and AppendEntries
> messages from the leader get backed up and then the follower gets a bunch of messages quickly when it reconnects to the
> leader. Looks like it eventually caught up to the leader although, from the output, the distro you're testing with is
> missing https://git.opendaylight.org/gerrit/#/c/78929/ which should speed up the sync process in that case and alleviate
> most of those messages.

Thanks Tom, makes sense then as this is on one controller that came up last after
bouncing them all.

yeah it eventually synced but it went down an inefficient path that resulted in 1176 messages from the leader to sync it. Actually that patch I mentioned above was for a different case and wouldn't help in this case. I have an idea where we can optimize it to eliminate those extra messages and make it faster. 
 

JamO

>
>     Thanks,
>     JamO
>     _______________________________________________
>     controller-dev mailing list
>     controller-dev@... <mailto:controller-dev@...>
>     https://lists.opendaylight.org/mailman/listinfo/controller-dev
>

Join {integration-dev@lists.opendaylight.org to automatically receive all group messages.