Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue


Faseela K
 

Hi Vishal,

 

    As we have already discussed, genius 3 node CSIT is randomly failing, due to bridge not showing up in topology/operational DS, on delete and create of bridge.

    You were indicating that, the clustered CSIT of genius will need some enhancements(add HAPROXY?) so that this issue will not occur.

    Could you please give pointers to Sathwik, so that he can start looking into it?

    Also, even if we don’t use HAPROXY, and delete and a create a bridge, why is there an issue in ovsdb plugin to detect the same?

    https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/genius-csit-verify-3node-upstream/220/robot-plugin/log.html.gz

 

Thanks,

Faseela


Vishal Thapar <vishal.thapar@...>
 

Hi Faseela,

 

I didn’t say that issue will not occur if we enhance Genius 3 node CSIT. Only that Genius 3 node CSIT isn’t configured like an actual cluster deployment.

 

Yes, there is an issue in OVSDB with disconnect/connect in rapid succession [1] and that is the issue we’re hitting in Genius 3 node CSIT. Issue is not with create/delete of bridge but connect/disconnect on OVSDB channel. Suneelu had fixed it for HWVTEP but there were some open questions for OVSDB and there were still open discussions on fix. Would be good to revive this discussion at DDF where Jamo and Anil both would be there. If we have OVSDB 3 node CSIT where we can reproduce this reliably, we can try 2-3 options and test out which one works.

 

Reason it is intermittent is because issue depends on EOS. If switch connects to node that isn’t leader for OVSDB Instance, you run into this issue. Also, there are these exceptions, not sure if these are cause or effect of the issue.

 

Caused by: java.lang.IllegalArgumentException: Metadata not available for modification NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry, modificationType=TOUCH, childModification={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry[{(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)target=tcp:10.30.170.65:6640}]=NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry[{(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)target=tcp:10.30.170.65:6640}], modificationType=DELETE, childModification={}]}]

 

 

Regards,

Vishal.

 

[1] https://lists.opendaylight.org/pipermail/ovsdb-dev/2018-February/004567.html

 

From: Faseela K
Sent: 22 March 2018 15:19
To: Vishal Thapar <vishal.thapar@...>
Cc: integration-dev@...; ovsdb-dev@...; B Sathwik <b.sathwik@...>; genius-dev@...
Subject: Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue

 

Hi Vishal,

 

    As we have already discussed, genius 3 node CSIT is randomly failing, due to bridge not showing up in topology/operational DS, on delete and create of bridge.

    You were indicating that, the clustered CSIT of genius will need some enhancements(add HAPROXY?) so that this issue will not occur.

    Could you please give pointers to Sathwik, so that he can start looking into it?

    Also, even if we don’t use HAPROXY, and delete and a create a bridge, why is there an issue in ovsdb plugin to detect the same?

    https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/genius-csit-verify-3node-upstream/220/robot-plugin/log.html.gz

 

Thanks,

Faseela


B Sathwik <b.sathwik@...>
 

Vishal,

   Suneelu was asking for the ovsdb TRACE logs for the 3 node CSIT runs.

   I need to know how to enable the same while running 3 node CSIT jobs in sandbox.

 

Any pointers ?

 

Regards

Sathwik

From: Vishal Thapar
Sent: Thursday, March 22, 2018 5:39 PM
To: Faseela K <faseela.k@...>
Cc: integration-dev@...; ovsdb-dev@...; B Sathwik <b.sathwik@...>; genius-dev@...
Subject: RE: Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue

 

Hi Faseela,

 

I didn’t say that issue will not occur if we enhance Genius 3 node CSIT. Only that Genius 3 node CSIT isn’t configured like an actual cluster deployment.

 

Yes, there is an issue in OVSDB with disconnect/connect in rapid succession [1] and that is the issue we’re hitting in Genius 3 node CSIT. Issue is not with create/delete of bridge but connect/disconnect on OVSDB channel. Suneelu had fixed it for HWVTEP but there were some open questions for OVSDB and there were still open discussions on fix. Would be good to revive this discussion at DDF where Jamo and Anil both would be there. If we have OVSDB 3 node CSIT where we can reproduce this reliably, we can try 2-3 options and test out which one works.

 

Reason it is intermittent is because issue depends on EOS. If switch connects to node that isn’t leader for OVSDB Instance, you run into this issue. Also, there are these exceptions, not sure if these are cause or effect of the issue.

 

Caused by: java.lang.IllegalArgumentException: Metadata not available for modification NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry, modificationType=TOUCH, childModification={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry[{(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)target=tcp:10.30.170.65:6640}]=NodeModification [identifier=(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry[{(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)target=tcp:10.30.170.65:6640}], modificationType=DELETE, childModification={}]}]

 

 

Regards,

Vishal.

 

[1] https://lists.opendaylight.org/pipermail/ovsdb-dev/2018-February/004567.html

 

From: Faseela K
Sent: 22 March 2018 15:19
To: Vishal Thapar <vishal.thapar@...>
Cc: integration-dev@...; ovsdb-dev@...; B Sathwik <b.sathwik@...>; genius-dev@...
Subject: Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue

 

Hi Vishal,

 

    As we have already discussed, genius 3 node CSIT is randomly failing, due to bridge not showing up in topology/operational DS, on delete and create of bridge.

    You were indicating that, the clustered CSIT of genius will need some enhancements(add HAPROXY?) so that this issue will not occur.

    Could you please give pointers to Sathwik, so that he can start looking into it?

    Also, even if we don’t use HAPROXY, and delete and a create a bridge, why is there an issue in ovsdb plugin to detect the same?

    https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/genius-csit-verify-3node-upstream/220/robot-plugin/log.html.gz

 

Thanks,

Faseela