On 12/02/2019 19:44, Luis Gomez wrote: Hi everybody,
FYI I have just tried OFP cluster test with "tell-based" protocol:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-neon/180/robot-plugin/log.html.gz
My observations:
1) node/port down events do not clear links in topology, this is why all topology check test fail. I think this is related to the transactions not commit in 5 seconds, hence masters are not created. 2) some WARNs are flooding the log:
2019-02-12T00:26:30,055 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-1-datastore-operational-fe-0-txn-30-2, ignoring
2019-02-12T00:26:30,056 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-2-datastore-operational-fe-0-txn-19-1, ignoring
2019-02-12T00:26:30,056 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-3-datastore-operational-fe-0-txn-7-1, ignoring This is interesting, as it starts happening for the same transaction on all shard members and these are standalone transactions, for which the history should always be there. Can you re-run the test with debug on org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder, please? 3) The cluster perf test does not pass: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-perf-bulkomatic-only-neon/180/robot-plugin/log.html.gz
I do not know if we still pursue to switch the cluster protocols, at least after this test it does not seem an straight forward change. I'd like to be able to ditch the old one, but it seems we need to shake some bugs out :( Thanks, Robert
|
|
On Feb 13, 2019, at 2:22 AM, Robert Varga <nite@...> wrote:
On 12/02/2019 19:44, Luis Gomez wrote:
Hi everybody,
FYI I have just tried OFP cluster test with "tell-based" protocol:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-neon/180/robot-plugin/log.html.gz
My observations:
1) node/port down events do not clear links in topology, this is why all topology check test fail. I think this is related to the transactions not commit in 5 seconds, hence masters are not created. Any workaround for this?
2) some WARNs are flooding the log:
2019-02-12T00:26:30,055 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-1-datastore-operational-fe-0-txn-30-2, ignoring
2019-02-12T00:26:30,056 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-2-datastore-operational-fe-0-txn-19-1, ignoring
2019-02-12T00:26:30,056 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-3-datastore-operational-fe-0-txn-7-1, ignoring This is interesting, as it starts happening for the same transaction on all shard members and these are standalone transactions, for which the history should always be there.
Can you re-run the test with debug on org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder, please?
Here it is: https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-3node-clustering-only-neon/1
3) The cluster perf test does not pass: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-perf-bulkomatic-only-neon/180/robot-plugin/log.html.gz
I do not know if we still pursue to switch the cluster protocols, at least after this test it does not seem an straight forward change. I'd like to be able to ditch the old one, but it seems we need to shake some bugs out :(
Thanks, Robert
|
|
On 19/02/2019 02:11, Luis Gomez wrote:
On Feb 13, 2019, at 2:22 AM, Robert Varga <nite@...> wrote:
On 12/02/2019 19:44, Luis Gomez wrote:
Hi everybody,
FYI I have just tried OFP cluster test with "tell-based" protocol:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-neon/180/robot-plugin/log.html.gz
My observations:
1) node/port down events do not clear links in topology, this is why all topology check test fail. I think this is related to the transactions not commit in 5 seconds, hence masters are not created. Any workaround for this?
Not sure... if we have messed up accounding (below), we may end up reporting things out of whack.
2) some WARNs are flooding the log:
2019-02-12T00:26:30,055 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-1-datastore-operational-fe-0-txn-30-2, ignoring
2019-02-12T00:26:30,056 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-2-datastore-operational-fe-0-txn-19-1, ignoring
2019-02-12T00:26:30,056 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-3-datastore-operational-fe-0-txn-7-1, ignoring This is interesting, as it starts happening for the same transaction on all shard members and these are standalone transactions, for which the history should always be there.
Can you re-run the test with debug on org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder, please? Here it is: https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-3node-clustering-only-neon/1
Thanks, this actually provides a lead: everything works with normal transaction chains, yet breaks down with single transactions. Since we have module-based shards in play and multi-shard commits, the cookie inside LocalHistoryIdentifier becomes significant in lookup -- and the single history is hard-wired to not have a cookie. https://git.opendaylight.org/gerrit/80392 does that. Regards, Robert
|
|
On Feb 19, 2019, at 6:16 AM, Robert Varga < nite@...> wrote:
On 19/02/2019 02:11, Luis Gomez wrote:
On Feb 13, 2019, at 2:22 AM, Robert Varga <nite@...> wrote:
On 12/02/2019 19:44, Luis Gomez wrote:
Hi everybody,
FYI I have just tried OFP cluster test with "tell-based" protocol:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-neon/180/robot-plugin/log.html.gz
My observations:
1) node/port down events do not clear links in topology, this is why all topology check test fail.
I think this is related to the transactions not commit in 5 seconds, hence masters are not created.
Any workaround for this?
Not sure... if we have messed up accounding (below), we may end upreporting things out of whack.
2) some WARNs are flooding the log:
2019-02-12T00:26:30,055 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-1-datastore-operational-fe-0-txn-30-2, ignoring
2019-02-12T00:26:30,056 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-2-datastore-operational-fe-0-txn-19-1, ignoring
2019-02-12T00:26:30,056 | WARN | opendaylight-cluster-data-shard-dispatcher-33 | FrontendClientMetadataBuilder | 223 - org.opendaylight.controller.sal-distributed-datastore - 1.9.0.SNAPSHOT | member-1-shard-inventory-operational: Unknown history for aborted transaction member-3-datastore-operational-fe-0-txn-7-1, ignoring
This is interesting, as it starts happening for the same transaction on all shard members and these are standalone transactions, for which the history should always be there.
Can you re-run the test with debug on org.opendaylight.controller.cluster.datastore.FrontendClientMetadataBuilder, please?
Here it is: https://jenkins.opendaylight.org/sandbox/job/openflowplugin-csit-3node-clustering-only-neon/1
Thanks, this actually provides a lead: everything works with normaltransaction chains, yet breaks down with single transactions.Since we have module-based shards in play and multi-shard commits, thecookie inside LocalHistoryIdentifier becomes significant in lookup --and the single history is hard-wired to not have a cookie.https://git.opendaylight.org/gerrit/80392 does that.
It looks like the WARNs are addressed, and the only issue remaining is the topology update when node/links go down:
Regards, Robert
|
|
On 19/02/2019 19:50, Luis Gomez wrote:
On Feb 19, 2019, at 6:16 AM, Robert Varga <nite@... <mailto:nite@...>> wrote:
On 19/02/2019 02:11, Luis Gomez wrote:
On Feb 13, 2019, at 2:22 AM, Robert Varga <nite@... <mailto:nite@...>> wrote:
On 12/02/2019 19:44, Luis Gomez wrote:
Hi everybody,
FYI I have just tried OFP cluster test with "tell-based" protocol:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-neon/180/robot-plugin/log.html.gz
My observations:
1) node/port down events do not clear links in topology, this is why all topology check test fail. I think this is related to the transactions not commit in 5 seconds, hence masters are not created. Any workaround for this? Not sure... if we have messed up accounding (below), we may end up reporting things out of whack.
Alright, we can ditch the builder debugs, as everything there works as it is supposed to. The reason for non-removal of the links is captured in https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/openflowplugin-csit-3node-clustering-only-sodium/1/odl_3/odl3_karaf.log.gz, I think, and it is a VerifyException. Filed to https://jira.opendaylight.org/browse/CONTROLLER-1885. Regards, Robert
|
|
toggle quoted message
Show quoted text
On Feb 20, 2019, at 2:35 AM, Robert Varga <nite@...> wrote:
On 19/02/2019 19:50, Luis Gomez wrote:
On Feb 19, 2019, at 6:16 AM, Robert Varga <nite@... <mailto:nite@...>> wrote:
On 19/02/2019 02:11, Luis Gomez wrote:
On Feb 13, 2019, at 2:22 AM, Robert Varga <nite@... <mailto:nite@...>> wrote:
On 12/02/2019 19:44, Luis Gomez wrote:
Hi everybody,
FYI I have just tried OFP cluster test with "tell-based" protocol:
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/openflowplugin-csit-3node-clustering-only-neon/180/robot-plugin/log.html.gz
My observations:
1) node/port down events do not clear links in topology, this is why all topology check test fail. I think this is related to the transactions not commit in 5 seconds, hence masters are not created. Any workaround for this? Not sure... if we have messed up accounding (below), we may end up reporting things out of whack. Alright, we can ditch the builder debugs, as everything there works as it is supposed to.
The reason for non-removal of the links is captured in https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/openflowplugin-csit-3node-clustering-only-sodium/1/odl_3/odl3_karaf.log.gz, I think, and it is a VerifyException.
Filed to https://jira.opendaylight.org/browse/CONTROLLER-1885.
Regards, Robert
|
|
|
|
While waiting for OFP, I have performed other tests with tell-based:
OVSDB seem to be fine with it:
Netconf I do not see any significant change (cluster test is not as stable as in OFP or OVSDB):
BR/Luis
toggle quoted message
Show quoted text
On Feb 26, 2019, at 1:01 PM, Robert Varga < nite@...> wrote:
|
|