Vishal Thapar <vthapar@...>
toggle quoted message
Show quoted text
On Sat, Apr 28, 2018 at 2:19 AM, Jamo Luhrsen <jluhrsen@...> wrote: re-sending with new email for Vishal
On 4/27/18 1:46 PM, Jamo Luhrsen wrote:
On 4/27/18 11:39 AM, Faseela K wrote:
Sam was mentioning in last genius weekly meeting that there is a JIRA already for this, and Suneelu is working on it.
@Vishal : Could you please share the JIRA?
We are hitting the issue intermittently, and when we try to debug with ovsdb TRACE logs, it never happens.
is this tested multiple times with TRACE logs enabled and we never hit the issue? If so, that leads me to believe some race condition is happening so perfectly that the little bit of slowdown we get with extra logging is enough to avoid it. fun :)
JamO
Thanks,
Faseela
*From:*Vishal Thapar *Sent:* Monday, March 26, 2018 2:22 PM *To:* B Sathwik <b.sathwik@...>; Tomáš Markovič <tomas.markovic@...> *Cc:* Sam Hague <shague@...>; genius-dev@...; Faseela K <faseela.k@...>; ovsdb-dev@...; integration-dev@...; K.V Suneelu Verma <k.v.suneelu.verma@...> *Subject:* RE: [integration-dev] Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Thanks Tomas, I missed the testplan part as I was facing exact same issue in my patch test job and wrongly assumed cause was same. After Sathwick’s change, it is indeed same infra issue.
https://jenkins.opendaylight.org/releng/job/genius-csit-1node-gate-all-fluorine/19/console
Regards,
Vishal.
*From:*B Sathwik *Sent:* 26 March 2018 14:19 *To:* Tomáš Markovič <tomas.markovic@... <mailto:tomas.markovic@...>> *Cc:* Sam Hague <shague@... <mailto:shague@...>>; genius-dev@... <mailto:genius-dev@...>; Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>>; Faseela K <faseela.k@... <mailto:faseela.k@...>>; ovsdb-dev@... <mailto:ovsdb-dev@...>; integration-dev@... <mailto:integration-dev@...>; K.V Suneelu Verma <k.v.suneelu.verma@... <mailto:k.v.suneelu.verma@...>> *Subject:* RE: [integration-dev] Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Hi,
Changed the test plan accordingly and rebuild it
Facing the following error. It’s a infra issue
2: Waiting for 15 minutes to create sandbox-genius-csit-3node-sathwikgate-all-fluorine-3.
1: CREATE_FAILED
ERROR: Failed to initialize infrastructure. Reason: Resource CREATE failed: OverLimit: resources.vm_1_group.resources[1].resources.volume: VolumeSizeExceedsAvailableQuota: Requested volume or snapshot exceeds allowed gigabytes quota. Requested 40G, quota is 8192G and 8160G has been consumed. (HTTP 413) (Request-ID: req-ebe75897-6320-49ea-b052-6d139ff869d1)
Regards
Sathwik
*From:*Tomáš Markovič [mailto:tomas.markovic@...] *Sent:* Monday, March 26, 2018 1:45 PM *To:* B Sathwik <b.sathwik@... <mailto:b.sathwik@...>> *Cc:* Sam Hague <shague@... <mailto:shague@...>>; genius-dev@... <mailto:genius-dev@...>; Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>>; Faseela K <faseela.k@... <mailto:faseela.k@...>>; ovsdb-dev@... <mailto:ovsdb-dev@...>; integration-dev@... <mailto:integration-dev@...>; K.V Suneelu Verma <k.v.suneelu.verma@... <mailto:k.v.suneelu.verma@...>> *Subject:* Re: [integration-dev] Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Also from,
*07:48:19* [ ERROR ] Expected at least 1 argument, got 0.
You can see you are using wrong testplan:
genius-sathwikgate-fluorine.txt / genius-sathwikgate.txt
which do not exist, so change them accordingly to what you want.
Regards,
Tomas Markovic
On Mon, Mar 26, 2018 at 9:20 AM, B Sathwik <b.sathwik@... <mailto:b.sathwik@...>> wrote:
Hi,
Started sandbox job with ovsdb TRACE logs for 3node genius CSIT.
https://jenkins.opendaylight.org/sandbox/job/genius-csit-3node-sathwikgate-all-fluorine/
Regards
Sathwik
*From:*Sam Hague [mailto:shague@... <mailto:shague@...>] *Sent:* Friday, March 23, 2018 7:41 PM *To:* B Sathwik <b.sathwik@... <mailto:b.sathwik@...>> *Cc:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>>; Faseela K <faseela.k@... <mailto:faseela.k@...>>; integration-dev@... <mailto:integration-dev@...>; ovsdb-dev@... <mailto:ovsdb-dev@...>; genius-dev@... <mailto:genius-dev@...>; K.V Suneelu Verma <k.v.suneelu.verma@... <mailto:k.v.suneelu.verma@...>> *Subject:* Re: [integration-dev] Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
On Mar 23, 2018 12:20 AM, "B Sathwik" <b.sathwik@... <mailto:b.sathwik@...>> wrote:
Vishal,
Suneelu was asking for the ovsdb TRACE logs for the 3 node CSIT runs.
I need to know how to enable the same while running 3 node CSIT jobs in sandbox.
In sandbox, simply add your custom trace settings in the CONTROLLERDEBUGMAPparam, like ovsdb:TRACE. Read the comment on that parameter. You can add multiple log settings.
Any pointers ?
Regards
Sathwik
*From:* Vishal Thapar *Sent:* Thursday, March 22, 2018 5:39 PM *To:* Faseela K <faseela.k@... <mailto:faseela.k@...>>
*Cc:* integration-dev@... <mailto:integration-dev@...>; ovsdb-dev@... <mailto:ovsdb-dev@...>; B Sathwik <b.sathwik@... <mailto:b.sathwik@...>>; genius-dev@... <mailto:genius-dev@...>
*Subject:* RE: Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Hi Faseela,
I didn’t say that issue will not occur if we enhance Genius 3 node CSIT. Only that Genius 3 node CSIT isn’t configured like an actual cluster deployment.
Yes, there is an issue in OVSDB with disconnect/connect in rapid succession [1] and that is the issue we’re hitting in Genius 3 node CSIT. Issue is not with create/delete of bridge but connect/disconnect on OVSDB channel. Suneelu had fixed it for HWVTEP but there were some open questions for OVSDB and there were still open discussions on fix. Would be good to revive this discussion at DDF where Jamo and Anil both would be there. If we have OVSDB 3 node CSIT where we can reproduce this reliably, we can try 2-3 options and test out which one works.
Reason it is intermittent is because issue depends on EOS. If switch connects to node that isn’t leader for OVSDB Instance, you run into this issue. Also, there are these exceptions, not sure if these are cause or effect of the issue.
Caused by: java.lang.IllegalArgumentException: Metadata not available for modification NodeModification
[identifier=(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry, modificationType=TOUCH,
childModification={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry[{(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)target=tcp:10.30.170.65 <http://10.30.170.65>:6640}]=NodeModification
[identifier=(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry[{(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)target=tcp:10.30.170.65:6640}], modificationType=DELETE, childModification={}]}]
Regards,
Vishal.
[1] https://lists.opendaylight.org/pipermail/ovsdb-dev/2018-February/004567.html
*From:* Faseela K *Sent:* 22 March 2018 15:19 *To:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>> *Cc:* integration-dev@... <mailto:integration-dev@...>; ovsdb-dev@... <mailto:ovsdb-dev@...>; B Sathwik <b.sathwik@... <mailto:b.sathwik@...>>; genius-dev@... <mailto:genius-dev@...> *Subject:* Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Hi Vishal,
As we have already discussed, genius 3 node CSIT is randomly failing, due to bridge not showing up in topology/operational DS, on delete and create of bridge.
You were indicating that, the clustered CSIT of genius will need some enhancements(add HAPROXY?) so that this issue will not occur.
Could you please give pointers to Sathwik, so that he can start looking into it?
Also, even if we don’t use HAPROXY, and delete and a create a bridge, why is there an issue in ovsdb plugin to detect the same?
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/genius-csit-verify-3node-upstream/220/robot-plugin/log.html.gz
Thanks,
Faseela
_______________________________________________ integration-dev mailing list integration-dev@... <mailto:integration-dev@...> https://lists.opendaylight.org/mailman/listinfo/integration-dev
_______________________________________________ integration-dev mailing list integration-dev@... <mailto:integration-dev@...> https://lists.opendaylight.org/mailman/listinfo/integration-dev
_______________________________________________ ovsdb-dev mailing list ovsdb-dev@... https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
_______________________________________________ genius-dev mailing list genius-dev@... https://lists.opendaylight.org/mailman/listinfo/genius-dev
|
Vishal Thapar <vthapar@...>
toggle quoted message
Show quoted text
On Sat, Apr 28, 2018 at 2:41 AM, Vishal Thapar <vthapar@...> wrote: Jira: https://jira.opendaylight.org/browse/OVSDB-438
Link to fix in HWVTEP code is in there too.
Regards, Vishal.
On Sat, Apr 28, 2018 at 2:19 AM, Jamo Luhrsen <jluhrsen@...> wrote:
re-sending with new email for Vishal
On 4/27/18 1:46 PM, Jamo Luhrsen wrote:
On 4/27/18 11:39 AM, Faseela K wrote:
Sam was mentioning in last genius weekly meeting that there is a JIRA already for this, and Suneelu is working on it.
@Vishal : Could you please share the JIRA?
We are hitting the issue intermittently, and when we try to debug with ovsdb TRACE logs, it never happens.
is this tested multiple times with TRACE logs enabled and we never hit the issue? If so, that leads me to believe some race condition is happening so perfectly that the little bit of slowdown we get with extra logging is enough to avoid it. fun :)
JamO
Thanks,
Faseela
*From:*Vishal Thapar *Sent:* Monday, March 26, 2018 2:22 PM *To:* B Sathwik <b.sathwik@...>; Tomáš Markovič <tomas.markovic@...> *Cc:* Sam Hague <shague@...>; genius-dev@...; Faseela K <faseela.k@...>; ovsdb-dev@...; integration-dev@...; K.V Suneelu Verma <k.v.suneelu.verma@...> *Subject:* RE: [integration-dev] Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Thanks Tomas, I missed the testplan part as I was facing exact same issue in my patch test job and wrongly assumed cause was same. After Sathwick’s change, it is indeed same infra issue.
https://jenkins.opendaylight.org/releng/job/genius-csit-1node-gate-all-fluorine/19/console
Regards,
Vishal.
*From:*B Sathwik *Sent:* 26 March 2018 14:19 *To:* Tomáš Markovič <tomas.markovic@... <mailto:tomas.markovic@...>> *Cc:* Sam Hague <shague@... <mailto:shague@...>>; genius-dev@... <mailto:genius-dev@...>; Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>>; Faseela K <faseela.k@... <mailto:faseela.k@...>>; ovsdb-dev@... <mailto:ovsdb-dev@...>; integration-dev@... <mailto:integration-dev@...>; K.V Suneelu Verma <k.v.suneelu.verma@... <mailto:k.v.suneelu.verma@...>> *Subject:* RE: [integration-dev] Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Hi,
Changed the test plan accordingly and rebuild it
Facing the following error. It’s a infra issue
2: Waiting for 15 minutes to create sandbox-genius-csit-3node-sathwikgate-all-fluorine-3.
1: CREATE_FAILED
ERROR: Failed to initialize infrastructure. Reason: Resource CREATE failed: OverLimit: resources.vm_1_group.resources[1].resources.volume: VolumeSizeExceedsAvailableQuota: Requested volume or snapshot exceeds allowed gigabytes quota. Requested 40G, quota is 8192G and 8160G has been consumed. (HTTP 413) (Request-ID: req-ebe75897-6320-49ea-b052-6d139ff869d1)
Regards
Sathwik
*From:*Tomáš Markovič [mailto:tomas.markovic@...] *Sent:* Monday, March 26, 2018 1:45 PM *To:* B Sathwik <b.sathwik@... <mailto:b.sathwik@...>> *Cc:* Sam Hague <shague@... <mailto:shague@...>>; genius-dev@... <mailto:genius-dev@...>; Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>>; Faseela K <faseela.k@... <mailto:faseela.k@...>>; ovsdb-dev@... <mailto:ovsdb-dev@...>; integration-dev@... <mailto:integration-dev@...>; K.V Suneelu Verma <k.v.suneelu.verma@... <mailto:k.v.suneelu.verma@...>> *Subject:* Re: [integration-dev] Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Also from,
*07:48:19* [ ERROR ] Expected at least 1 argument, got 0.
You can see you are using wrong testplan:
genius-sathwikgate-fluorine.txt / genius-sathwikgate.txt
which do not exist, so change them accordingly to what you want.
Regards,
Tomas Markovic
On Mon, Mar 26, 2018 at 9:20 AM, B Sathwik <b.sathwik@... <mailto:b.sathwik@...>> wrote:
Hi,
Started sandbox job with ovsdb TRACE logs for 3node genius CSIT.
https://jenkins.opendaylight.org/sandbox/job/genius-csit-3node-sathwikgate-all-fluorine/
Regards
Sathwik
*From:*Sam Hague [mailto:shague@... <mailto:shague@...>] *Sent:* Friday, March 23, 2018 7:41 PM *To:* B Sathwik <b.sathwik@... <mailto:b.sathwik@...>> *Cc:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>>; Faseela K <faseela.k@... <mailto:faseela.k@...>>; integration-dev@... <mailto:integration-dev@...>; ovsdb-dev@... <mailto:ovsdb-dev@...>; genius-dev@... <mailto:genius-dev@...>; K.V Suneelu Verma <k.v.suneelu.verma@... <mailto:k.v.suneelu.verma@...>> *Subject:* Re: [integration-dev] Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
On Mar 23, 2018 12:20 AM, "B Sathwik" <b.sathwik@... <mailto:b.sathwik@...>> wrote:
Vishal,
Suneelu was asking for the ovsdb TRACE logs for the 3 node CSIT runs.
I need to know how to enable the same while running 3 node CSIT jobs in sandbox.
In sandbox, simply add your custom trace settings in the CONTROLLERDEBUGMAPparam, like ovsdb:TRACE. Read the comment on that parameter. You can add multiple log settings.
Any pointers ?
Regards
Sathwik
*From:* Vishal Thapar *Sent:* Thursday, March 22, 2018 5:39 PM *To:* Faseela K <faseela.k@... <mailto:faseela.k@...>>
*Cc:* integration-dev@... <mailto:integration-dev@...>; ovsdb-dev@... <mailto:ovsdb-dev@...>; B Sathwik <b.sathwik@... <mailto:b.sathwik@...>>; genius-dev@... <mailto:genius-dev@...>
*Subject:* RE: Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Hi Faseela,
I didn’t say that issue will not occur if we enhance Genius 3 node CSIT. Only that Genius 3 node CSIT isn’t configured like an actual cluster deployment.
Yes, there is an issue in OVSDB with disconnect/connect in rapid succession [1] and that is the issue we’re hitting in Genius 3 node CSIT. Issue is not with create/delete of bridge but connect/disconnect on OVSDB channel. Suneelu had fixed it for HWVTEP but there were some open questions for OVSDB and there were still open discussions on fix. Would be good to revive this discussion at DDF where Jamo and Anil both would be there. If we have OVSDB 3 node CSIT where we can reproduce this reliably, we can try 2-3 options and test out which one works.
Reason it is intermittent is because issue depends on EOS. If switch connects to node that isn’t leader for OVSDB Instance, you run into this issue. Also, there are these exceptions, not sure if these are cause or effect of the issue.
Caused by: java.lang.IllegalArgumentException: Metadata not available for modification NodeModification
[identifier=(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry, modificationType=TOUCH,
childModification={(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry[{(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)target=tcp:10.30.170.65 <http://10.30.170.65>:6640}]=NodeModification
[identifier=(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)manager-entry[{(urn:opendaylight:params:xml:ns:yang:ovsdb?revision=2015-01-05)target=tcp:10.30.170.65:6640}], modificationType=DELETE, childModification={}]}]
Regards,
Vishal.
[1] https://lists.opendaylight.org/pipermail/ovsdb-dev/2018-February/004567.html
*From:* Faseela K *Sent:* 22 March 2018 15:19 *To:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>> *Cc:* integration-dev@... <mailto:integration-dev@...>; ovsdb-dev@... <mailto:ovsdb-dev@...>; B Sathwik <b.sathwik@... <mailto:b.sathwik@...>>; genius-dev@... <mailto:genius-dev@...> *Subject:* Genius CSIT intermittent 3 node failures due to OVSDB reconnect and connect issue
Hi Vishal,
As we have already discussed, genius 3 node CSIT is randomly failing, due to bridge not showing up in topology/operational DS, on delete and create of bridge.
You were indicating that, the clustered CSIT of genius will need some enhancements(add HAPROXY?) so that this issue will not occur.
Could you please give pointers to Sathwik, so that he can start looking into it?
Also, even if we don’t use HAPROXY, and delete and a create a bridge, why is there an issue in ovsdb plugin to detect the same?
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/genius-csit-verify-3node-upstream/220/robot-plugin/log.html.gz
Thanks,
Faseela
_______________________________________________ integration-dev mailing list integration-dev@... <mailto:integration-dev@...> https://lists.opendaylight.org/mailman/listinfo/integration-dev
_______________________________________________ integration-dev mailing list integration-dev@... <mailto:integration-dev@...> https://lists.opendaylight.org/mailman/listinfo/integration-dev
_______________________________________________ ovsdb-dev mailing list ovsdb-dev@... https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
_______________________________________________ genius-dev mailing list genius-dev@... https://lists.opendaylight.org/mailman/listinfo/genius-dev
|