This group is locked. No changes can be made to the group while it is locked.
Date
1 - 4 of 4
missing flow after restarting a compute node
Jamo Luhrsen <jluhrsen@...>
Hi OVSDB,
this is regarding bug 8877 [0] where it was noticed that an openstack instance didn't get it's IP address when it was launched shortly after rebooting a compute node. I was looking at the karaf.log attached to it and noticed an ERROR in ovsdb, that I haven't come across before. ERROR | assiveConnServ-7 | StalePassiveConnectionService | 285 - org.opendaylight.ovsdb.library - 1.4.1.Carbon-redhat-1 | Error in checking stale connections) Do you know if it's serious? Could it possibly be related to some flows not getting installed when the instance is spawned? Thanks, JamO [0] https://bugs.opendaylight.org/show_bug.cgi?id=8877 |
Anil Vishnoi
Hi Jamo, This error pop-up when the ovsdb manager improperly disconnects from the ovsdb plugin, but it still thinks that the connection is still present. When you reconnect the ovsdb manager, plugin thinks that it's a new connection and probably the old connection is stale. So it sends the echo message on the old connection and wait for the echo response. Looks like some runtime exception happened while sending the echo message that caused the failure of echo request. Do you see any other exception after or before this line ? This failure can impact publishing the connect to the ovsdb southbound plugin, which can probably impact the flow installation, but not sure about that part, it depends on how internally netvirt act on it. On Mon, Jul 24, 2017 at 11:44 AM, Jamo Luhrsen <jluhrsen@...> wrote: Hi OVSDB, --
Thanks Anil |
Sridhar Gaddam
Hello Anil, Looks like we hit the same issue in our local testing. With ODL Carbon (+ Pike, OVS2.7), during one reboot scenario, we observed some race condition in ODL/OVSDB. Can you please let us know if this issue is a known-issue/addressed in OVSDB? Steps to Reproduce (in a working setup with a controller and two compute nodes): 1. Restart the compute node and wait for the compute node to come up. 2. Launch an instance on the compute node 3. You can observe that the instance initially stays in "spawning" state and then transitions to "error" state. 4. Restart the openvswitch on the compute node 5. Launch a new instance and it would boot successfully. Basically, when we issue the reboot on the compute node, ODL identifies that the node is idle and triggers the disconnection chain. But, while this is going on, when the Compute node comes up, we could see that there is a race condition between the cleanup events and the events related to the node reconciliation. In this process, we could see that finally the Compute node is deleted from the operational store [#] eventhough its connected to the controller. Since the node info is deleted from the datastore, the side effect is that port-binding fails and we are unable to spawn new VMs until we restart the OVS Switch on the Compute node. Following[@] is a SNAP of the karaf logs which show this sequence. Additional notes: In case, the compute node comes up with some delay (i.e., after the cleanup is properly done in ODL) this issue (i.e., step3 above) is not seen. [#] 2017-08-01 07:48:16,660 | INFO | lt-dispatcher-49 | OvsdbConnectionManager | 289 - org.opendaylight.ovsdb.southbound-impl - 1.4.1.Carbon-redhat-1 | Entity{type='ovsdb', id=/(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)network-topology/topology/topology[{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)topology-id=ovsdb:1}]/node/node[{(urn:TBD:params:xml:ns:yang:network-topology?revision=2013-10-21)node-id=ovsdb://uuid/e9806896-8dc2-4f17-83ea-c1c957608915}]} has no owner, cleaning up the operational data store Thanks, --Sridhar. On Thu, Jul 27, 2017 at 1:22 PM, Anil Vishnoi <vishnoianil@...> wrote:
|
Sridhar Gaddam
++ netvirt-dev (in case anyone has observed similar issue) On Tue, Aug 1, 2017 at 10:31 PM, Sridhar Gaddam <sgaddam@...> wrote:
|