[CSIT] NETVIRT-1599 - Upgrade Failures:Connectivity check fails after upgrade and ovsdb egress flows missing
Srinivas <srinivas.rachakonda@...>
Hi Deena/Soma,
After discussion we found that there is a bundle issue when connecting back the ovs to ODL.
Updated the same in JIRA and assigned it to Deena.
Steps to reproduce:
Error messages:
2020-01-20T13:32:26.913Z|00101|connmgr|INFO|br-int: removed primary controller "tcp:192.168.56.105:6653" 2020-01-20T13:38:44.387Z|00102|connmgr|INFO|br-int: added primary controller "tcp:192.168.56.105:6653" 2020-01-20T13:38:44.387Z|00103|rconn|INFO|br-int<->tcp:192.168.56.105:6653: connecting... 2020-01-20T13:38:44.480Z|00104|rconn|INFO|br-int<->tcp:192.168.56.105:6653: connected 2020-01-20T13:38:44.757Z|00105|connmgr|INFO|br-int<->tcp:192.168.56.105:6653: sending OFPBFC_BAD_ID error reply to ONFT_BUNDLE_CONTROL message 2020-01-20T13:38:57.418Z|00106|connmgr|INFO|br-int<->tcp:192.168.56.105:6653: sending OFPBFC_TIMEOUT error reply to ONFT_BUNDLE_CONTROL message
Thanks, Srinivas +91-9243478719
From: Dayavanti Gopal Kamath <dayavanti.gopal.kamath@...>
Sent: 10 January 2020 20:52 To: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>; Srinivas Rachakonda <srinivas.rachakonda@...>; Abhinav Gupta <abhinav.gupta@...>; Guruvayur A Ramanathan <guruvayur.a.ramanathan@...>; Karthikeyan Krishnan <karthikeyan.k@...> Cc: Abhishek Nagori <abhishek.nagori@...>; Prakash Padmanabhan <prakash.padmanabhan@...>; R Srinivasan E <r.e.srinivasan@...>; Naveen Manyam Subramanyam <naveen.manyam.subramanyam@...>; D Arunprakash <d.arunprakash@...>; Gobinath . <gobinath@...>; Chetan Arakere Gowdru <chetan.arakere@...>; R P Karthika . <r.p.karthika@...> Subject: RE: [CSIT] NETVIRT-1599 - Upgrade Failures:Connectivity check fails after upgrade and ovsdb egress flows missing
Hi all, Pls push the tech analysis part of these discussions into the jira or on the community mailing lists, so others can also keep updated.
Thanks, daya
From: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>
Hi Srinivas,
The test case is started at 20200108 12:42:18(2020-01-08T07:12:18). The switch control_1 is connected to controller by 2020-01-08T07:12:25. But though switch is not connected, at 20200108 12:42:19.422(2020-01-08T07:12:19) bundle commit is tried. So as switch was not yet connected, the rpc for serving bundle commit is not yet registered. Due to this, it is throwing below error.
{"errors":{"error":[{"error-type":"application","error-tag":"operation-not-supported","error-message":"No implementation of RPC AbsoluteSchemaPath{path=[(urn:opendaylight:params:xml:ns:yang:openflowplugin:app:arbitrator-reconcile:service?revision=2018-02-27)commit-active-bundle]} available"}]}}
Openflow connection time. 2020-01-08T07:12:25,834 | INFO | epollEventLoopGroup-9-1 | ContextChainHolderImpl | 391 - org.opendaylight.openflowplugin.impl - 0.9.2.SNAPSHOT | Device openflow:238161745143452 connected.
Can you please ensure whether switch is connected or not before trying to do commit active bundle?
And also, before attempting bundle commit rpc, please check below rpc via POST rest command once switch is connected.
http://<controller-ip>:8181/restconf/operations/arbitrator-reconcile:get-active-bundle
{ "input": { "node": "/opendaylight-inventory:nodes/opendaylight-inventory:node[opendaylight-inventory:id='openflow:<dpn-id>']", "node-id": "<dpn-id>" } }
This is supposed to print bundle-id as result, so that you can ensure that there is valid bundle pending for commit. Then you can go ahead and commit same.
Valid Output where bundle is open: { "output": { "result": 1 } }
Invalid output where bundle is not open for commit: { "output": {} }
Regards, Somashekhar
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Som/Ramanathan,
I had tried with the patch for this. https://git.opendaylight.org/gerrit/c/integration/test/+/86802
The json commit is failing.
Log:
Can you please let me know if this is correct.
Thanks, Srinivas +91-9243478719
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Abhinav,
I am working on this since morning today.
Thanks, Srinivas +91-9243478719
From: Abhinav Gupta <abhinav.gupta@...>
Hi Srinivas/Ram, any update here?
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Ramanathan,
Can you please let me know where to add this commit bundle.
The script follows the below steps:
The below testcase is where we set the upgrade flag.
Set Upgrade Flag
Thanks, Srinivas +91-9243478719
From: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>
Hi Srinivas,
Please invoke the below rpc through rest call with mentioned input for committing openflow bundle.
{ "input": { "node": "/opendaylight-inventory:nodes/opendaylight-inventory:node[opendaylight-inventory:id='openflow:<dpn-id>']", "node-id": "<dpn-id>" } }
For ex: { "input": { "node": "/opendaylight-inventory:nodes/opendaylight-inventory:node[opendaylight-inventory:id='openflow:86278166223181']", "node-id": "86278166223181" } }
And also can you please confirm with Ramanathan that when this bundle commit should be triggered.
Regards, Somashekhar
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Som,
Can you please point the rpc command for committing the openflow bundle. I will add it to the script.
Thanks, Srinivas +91-9243478719
From: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>
Hi Srinivas,
I have looked into the job https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/srini-netvirt-csit-1node-0cmb-1ctl-2cmp-openstack-queens-upgrade-sodium/2/robot-plugin/log_01_upgrade.html#s1-t1.
ODL was stopped
Delete OVS manager, controller and groups and tun ports
Start controller, wait for it to come "UP" and make sure netvirt is installed
When controller was started, there were no flow entries for any of the dpns at config inventory. Then the upgrade flag is set and later dpns are allowed to connect.
Set controller and manager on each OpenStack node and check that egress flows are present
Dpns got connected and as upgrade flag was set, the arbitrator reconciliation has started for all dpns. Here we open bundle and add all the flows of config inventory, write dpn info to operational inventory and wait for upgrade script to commit the bundle.
As there are no config entries for dpns immediately when dpns were connected, no flows are added to bundle. Dpn info is written to operational inventory. So applications have added flows to config datastore. And same are added to openflow bundle also. But in this case, upgrade script is supposed to call the rpc for committing the openflow bundle. As this is not done, though switch is connected and flows are present in config inventory, no flow is pushed to the dpns.
In case if you are not having any old config entries to fill up to controller after upgrade and if you want dpns to go through the normal resync(where you don’t want to commit openflow bundle), you should disable upgrade flag before connecting any dpn.
Regards, Somashekhar
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Som,
I had provided sleep time of 6 mins for flow to be programmed after connecting back the OVS to ODL(after upgrade flag is set via REST).
Even after that the flows are not seen and the script is failing.
This needs to be looked by design.
The script is still running and once complete will collect the logs and provide the info.
Please note that through REST we are setting the upgrade flag, so there is no rpc bundle commit.
Thanks, Srinivas +91-9243478719
From: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>
Hi Srinivas R,
As upgrade flag is enabled, arbitrator reconciliation is happening. But I don’t see anybody calling bundle commit rpc call. So bundle timeout is happening after 15 secs. So none of the flows are committed to switch.
DPN id: 189283045813055 2019-11-07T04:05:04.768Z|00966|vconn|DBG|tcp:10.30.170.84:6653: received: ONFT_BUNDLE_CONTROL (OF1.3) (xid=0x3): bundle_id=0 type=OPEN_REQUEST flags=atomic ordered 2019-11-07T04:05:04.768Z|00967|vconn|DBG|tcp:10.30.170.84:6653: sent (Success): ONFT_BUNDLE_CONTROL (OF1.3) (xid=0x3): bundle_id=0 type=OPEN_REPLY flags=0 2019-11-07T04:05:19.673Z|01158|connmgr|INFO|br-int<->tcp:10.30.170.84:6653: sending OFPBFC_TIMEOUT error reply to ONFT_BUNDLE_CONTROL message ONFT_BUNDLE_CONTROL (OF1.3) (xid=0x3):
Is upgrade flag expected to be enabled? And if yes, can you please try increasing bundle-idle-timeout to higher value and do rest call for bundle commit rpc before bundle timeout.
Regards, Somashekhar
From: D Arunprakash <d.arunprakash@...>
+Som
From: Chetan Arakere Gowdru <chetan.arakere@...>
++ ofplugin
As I see the none of the flows getting pushed back to switch on DPN connecting back and the flows are present in config inventory-nodes DS.
Thanks, Chetan From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Karthika,
Any update on this as Netvirt CSIT jobs are still failing due to this.
Thanks, Srinivas +91-9243478719
From: R Srinivasan E <r.e.srinivasan@...>
+ Karthika
From: Prakash Padmanabhan <prakash.padmanabhan@...>
Hi Srini,
Seems to be unrelated to platform RBU implementation which is non-existent upstream. Copying Srini.
Regards, Prakash
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Prakash,
Can someone please look into the JIRA.
Thanks, Srinivas +91-9243478719
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Ramanathan,
The below JIRA has failure in upgrade.
https://jira.opendaylight.org/browse/NETVIRT-1599
Can you please have a look and let me know whom should it be assigned.
Please kindly help me in this regards.
Thanks, Srinivas +91-9243478719
|
||||||||||||||
|
||||||||||||||
Srinivas <srinivas.rachakonda@...>
Hi Deena,
The suite is still failing with the changes made. Below are the steps performed:
JOB:
ROBOT LOGS:
Update the same In JIRA.
Thanks, Srinivas +91-9243478719
From: Dheenadayalan B <dheenadayalan.b@...>
Sent: 28 January 2020 17:46 To: srinivas.rachakonda@...; 'Karthikeyan Krishnan' <karthikeyan.k@...>; 'SOMASHEKHAR MANOHARA JAVALAGI' <somashekhar.manohara.javalagi@...> Cc: 'Abhishek Nagori' <abhishek.nagori@...>; 'Prakash Padmanabhan' <prakash.padmanabhan@...>; 'R Srinivasan E' <r.e.srinivasan@...>; 'Naveen Manyam Subramanyam' <naveen.manyam.subramanyam@...>; 'Guruvayur A Ramanathan' <guruvayur.a.ramanathan@...>; 'Dayavanti Gopal Kamath' <dayavanti.gopal.kamath@...>; 'D Arunprakash' <d.arunprakash@...>; 'Gobinath .' <gobinath@...>; 'Chetan Arakere Gowdru' <chetan.arakere@...>; 'R P Karthika .' <r.p.karthika@...> Subject: RE: [CSIT] NETVIRT-1599 - Upgrade Failures:Connectivity check fails after upgrade and ovsdb egress flows missing
Hi Srini, Please set bundle-idle-timeout to 60 Minutes on switch as below before invoking controller upgrade. sudo ovs-vsctl set Open_vSwitch . other_config:bundle-idle-timeout=3600 Regards, Dheena
From: Dheenadayalan B <dheenadayalan.b@...>
Hi Srini, As per the analysis, Code fix is not required for this change only configuration change required at switch end. As discussed in separate thread please increase bundle-idle-timeout to 60 seconds on switch, and please verify and confirm, if bundle commit after upgrade is successful. Thanks, Dheena
From: Dheenadayalan B <dheenadayalan.b@...>
Hi Srini, I have already raised review. Will let you know once review changes are merged. Thanks, Dheena
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Dheena,
Any update on this.
Thanks, Srinivas +91-9243478719
From: Dheenadayalan B <dheenadayalan.b@...>
Hi Srini, I can replicate this issue. By Monday EOD fix will be provided. Thanks, Dheena
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Deena/Soma,
After discussion we found that there is a bundle issue when connecting back the ovs to ODL.
Updated the same in JIRA and assigned it to Deena.
Steps to reproduce:
Error messages:
2020-01-20T13:32:26.913Z|00101|connmgr|INFO|br-int: removed primary controller "tcp:192.168.56.105:6653" 2020-01-20T13:38:44.387Z|00102|connmgr|INFO|br-int: added primary controller "tcp:192.168.56.105:6653" 2020-01-20T13:38:44.387Z|00103|rconn|INFO|br-int<->tcp:192.168.56.105:6653: connecting... 2020-01-20T13:38:44.480Z|00104|rconn|INFO|br-int<->tcp:192.168.56.105:6653: connected 2020-01-20T13:38:44.757Z|00105|connmgr|INFO|br-int<->tcp:192.168.56.105:6653: sending OFPBFC_BAD_ID error reply to ONFT_BUNDLE_CONTROL message 2020-01-20T13:38:57.418Z|00106|connmgr|INFO|br-int<->tcp:192.168.56.105:6653: sending OFPBFC_TIMEOUT error reply to ONFT_BUNDLE_CONTROL message
Thanks, Srinivas +91-9243478719
From: Dayavanti Gopal Kamath <dayavanti.gopal.kamath@...>
Hi all, Pls push the tech analysis part of these discussions into the jira or on the community mailing lists, so others can also keep updated.
Thanks, daya
From: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>
Hi Srinivas,
The test case is started at 20200108 12:42:18(2020-01-08T07:12:18). The switch control_1 is connected to controller by 2020-01-08T07:12:25. But though switch is not connected, at 20200108 12:42:19.422(2020-01-08T07:12:19) bundle commit is tried. So as switch was not yet connected, the rpc for serving bundle commit is not yet registered. Due to this, it is throwing below error.
{"errors":{"error":[{"error-type":"application","error-tag":"operation-not-supported","error-message":"No implementation of RPC AbsoluteSchemaPath{path=[(urn:opendaylight:params:xml:ns:yang:openflowplugin:app:arbitrator-reconcile:service?revision=2018-02-27)commit-active-bundle]} available"}]}}
Openflow connection time. 2020-01-08T07:12:25,834 | INFO | epollEventLoopGroup-9-1 | ContextChainHolderImpl | 391 - org.opendaylight.openflowplugin.impl - 0.9.2.SNAPSHOT | Device openflow:238161745143452 connected.
Can you please ensure whether switch is connected or not before trying to do commit active bundle?
And also, before attempting bundle commit rpc, please check below rpc via POST rest command once switch is connected.
http://<controller-ip>:8181/restconf/operations/arbitrator-reconcile:get-active-bundle
{ "input": { "node": "/opendaylight-inventory:nodes/opendaylight-inventory:node[opendaylight-inventory:id='openflow:<dpn-id>']", "node-id": "<dpn-id>" } }
This is supposed to print bundle-id as result, so that you can ensure that there is valid bundle pending for commit. Then you can go ahead and commit same.
Valid Output where bundle is open: { "output": { "result": 1 } }
Invalid output where bundle is not open for commit: { "output": {} }
Regards, Somashekhar
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Som/Ramanathan,
I had tried with the patch for this. https://git.opendaylight.org/gerrit/c/integration/test/+/86802
The json commit is failing.
Log:
Can you please let me know if this is correct.
Thanks, Srinivas +91-9243478719
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Abhinav,
I am working on this since morning today.
Thanks, Srinivas +91-9243478719
From: Abhinav Gupta <abhinav.gupta@...>
Hi Srinivas/Ram, any update here?
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Ramanathan,
Can you please let me know where to add this commit bundle.
The script follows the below steps:
The below testcase is where we set the upgrade flag.
Set Upgrade Flag
Thanks, Srinivas +91-9243478719
From: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>
Hi Srinivas,
Please invoke the below rpc through rest call with mentioned input for committing openflow bundle.
{ "input": { "node": "/opendaylight-inventory:nodes/opendaylight-inventory:node[opendaylight-inventory:id='openflow:<dpn-id>']", "node-id": "<dpn-id>" } }
For ex: { "input": { "node": "/opendaylight-inventory:nodes/opendaylight-inventory:node[opendaylight-inventory:id='openflow:86278166223181']", "node-id": "86278166223181" } }
And also can you please confirm with Ramanathan that when this bundle commit should be triggered.
Regards, Somashekhar
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Som,
Can you please point the rpc command for committing the openflow bundle. I will add it to the script.
Thanks, Srinivas +91-9243478719
From: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>
Hi Srinivas,
I have looked into the job https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/srini-netvirt-csit-1node-0cmb-1ctl-2cmp-openstack-queens-upgrade-sodium/2/robot-plugin/log_01_upgrade.html#s1-t1.
ODL was stopped
Delete OVS manager, controller and groups and tun ports
Start controller, wait for it to come "UP" and make sure netvirt is installed
When controller was started, there were no flow entries for any of the dpns at config inventory. Then the upgrade flag is set and later dpns are allowed to connect.
Set controller and manager on each OpenStack node and check that egress flows are present
Dpns got connected and as upgrade flag was set, the arbitrator reconciliation has started for all dpns. Here we open bundle and add all the flows of config inventory, write dpn info to operational inventory and wait for upgrade script to commit the bundle.
As there are no config entries for dpns immediately when dpns were connected, no flows are added to bundle. Dpn info is written to operational inventory. So applications have added flows to config datastore. And same are added to openflow bundle also. But in this case, upgrade script is supposed to call the rpc for committing the openflow bundle. As this is not done, though switch is connected and flows are present in config inventory, no flow is pushed to the dpns.
In case if you are not having any old config entries to fill up to controller after upgrade and if you want dpns to go through the normal resync(where you don’t want to commit openflow bundle), you should disable upgrade flag before connecting any dpn.
Regards, Somashekhar
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Som,
I had provided sleep time of 6 mins for flow to be programmed after connecting back the OVS to ODL(after upgrade flag is set via REST).
Even after that the flows are not seen and the script is failing.
This needs to be looked by design.
The script is still running and once complete will collect the logs and provide the info.
Please note that through REST we are setting the upgrade flag, so there is no rpc bundle commit.
Thanks, Srinivas +91-9243478719
From: SOMASHEKHAR MANOHARA JAVALAGI <somashekhar.manohara.javalagi@...>
Hi Srinivas R,
As upgrade flag is enabled, arbitrator reconciliation is happening. But I don’t see anybody calling bundle commit rpc call. So bundle timeout is happening after 15 secs. So none of the flows are committed to switch.
DPN id: 189283045813055 2019-11-07T04:05:04.768Z|00966|vconn|DBG|tcp:10.30.170.84:6653: received: ONFT_BUNDLE_CONTROL (OF1.3) (xid=0x3): bundle_id=0 type=OPEN_REQUEST flags=atomic ordered 2019-11-07T04:05:04.768Z|00967|vconn|DBG|tcp:10.30.170.84:6653: sent (Success): ONFT_BUNDLE_CONTROL (OF1.3) (xid=0x3): bundle_id=0 type=OPEN_REPLY flags=0 2019-11-07T04:05:19.673Z|01158|connmgr|INFO|br-int<->tcp:10.30.170.84:6653: sending OFPBFC_TIMEOUT error reply to ONFT_BUNDLE_CONTROL message ONFT_BUNDLE_CONTROL (OF1.3) (xid=0x3):
Is upgrade flag expected to be enabled? And if yes, can you please try increasing bundle-idle-timeout to higher value and do rest call for bundle commit rpc before bundle timeout.
Regards, Somashekhar
From: D Arunprakash <d.arunprakash@...>
+Som
From: Chetan Arakere Gowdru <chetan.arakere@...>
++ ofplugin
As I see the none of the flows getting pushed back to switch on DPN connecting back and the flows are present in config inventory-nodes DS.
Thanks, Chetan From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Karthika,
Any update on this as Netvirt CSIT jobs are still failing due to this.
Thanks, Srinivas +91-9243478719
From: R Srinivasan E <r.e.srinivasan@...>
+ Karthika
From: Prakash Padmanabhan <prakash.padmanabhan@...>
Hi Srini,
Seems to be unrelated to platform RBU implementation which is non-existent upstream. Copying Srini.
Regards, Prakash
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Prakash,
Can someone please look into the JIRA.
Thanks, Srinivas +91-9243478719
From: srinivas.rachakonda@... <srinivas.rachakonda@...>
Hi Ramanathan,
The below JIRA has failure in upgrade.
https://jira.opendaylight.org/browse/NETVIRT-1599
Can you please have a look and let me know whom should it be assigned.
Please kindly help me in this regards.
Thanks, Srinivas +91-9243478719
|
||||||||||||||
|