Re: [controller-dev] OFP cluster test with "tell-based"


Luis Gomez
 



On Feb 19, 2019, at 6:16 AM, Robert Varga <nite@...> wrote:

On 19/02/2019 02:11, Luis Gomez wrote:


On Feb 13, 2019, at 2:22 AM, Robert Varga <nite@...> wrote:

On 12/02/2019 19:44, Luis Gomez wrote:
Hi everybody,

FYI I have just tried OFP cluster test with "tell-based" protocol:

https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-neon/180/robot-plugin/log.html.gz

My observations:

1) node/port down events do not clear links in topology, this is why all topology check test fail.

I think this is related to the transactions not commit in 5 seconds,
hence masters are not created.

Any workaround for this?

Not sure... if we have messed up accounding (below), we may end up
reporting things out of whack.



2) some WARNs are flooding the log:

2019-02-12T00:26:30,055 | WARN  | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder    | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-1-datastore-operational-fe-0-txn-30-2, ignoring

2019-02-12T00:26:30,056 | WARN  | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder    | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-2-datastore-operational-fe-0-txn-19-1, ignoring

2019-02-12T00:26:30,056 | WARN  | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder    | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-3-datastore-operational-fe-0-txn-7-1, ignoring

This is interesting, as it starts happening for the same transaction on
all shard members and these are standalone transactions, for which the
history should always be there.

Can you re-run the test with debug on
org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder,
please?

Here it is: https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-3node-clustering-only-neon/1

Thanks, this actually provides a lead: everything works with normal
transaction chains, yet breaks down with single transactions.

Since we have module-based shards in play and multi-shard commits, the
cookie inside LocalHistoryIdentifier becomes significant in lookup --
and the single history is hard-wired to not have a cookie.

https://git.opendaylight.org/gerrit/80392 does that.


It looks like the WARNs are addressed, and the only issue remaining is the topology update when node/links go down:


Regards,
Robert

Join {integration-dev@lists.opendaylight.org to automatically receive all group messages.