Re: [netvirt-dev] how to address ovsdb node connection flap

Jamo Luhrsen <jluhrsen@...>

yeah, we have 3node ovsdb CSIT we can enhance to cover this.

would it be as simple as this:

- 3 node ODL cluster
- create a single ovsdb connection to one of the ODL nodes
- check operational store for node
- disconnect and reconnect
- check operational store that the node is still there

I can repeat it in CSIT once for the leader and again on a


On 2/19/18 11:54 PM, Vishal Thapar wrote:
Replaced netvirt-dev with genius-dev.


Hi all,


Reviving this discussion coz we’re hitting similar issue in Genius 3 node CSIT now, it is not very consistent but we
occasionally run into it. We got some workarounds for it and some improvements needed in Genius 3 node CSIT itself, but
issue is still there in OVSDB.


Can we conclude on what to do? Suneelu’s patch for HWVTEP is already in, is same design [Ibelieve Anil’s suggestion of
Option 3] good enough for OVSDB?



Do we have cluster CSIT for OVSDB? Probably good to add test case to reproduce this issue and test fix against it?





*From:*ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...] *On Behalf Of *Vishal
*Sent:* 19 December 2017 12:43
*To:* K.V Suneelu Verma <k.v.suneelu.verma@...>; Anil Vishnoi <vishnoianil@...>
*Cc:* netvirt-dev@...; ovsdb-dev@...
*Subject:* Re: [ovsdb-dev] [netvirt-dev] how to address ovsdb node connection flap


Some inputs inline.


*From:*netvirt-dev-bounces@... <mailto:netvirt-dev-bounces@...>
[mailto:netvirt-dev-bounces@...] *On Behalf Of *K.V Suneelu Verma
*Sent:* 19 December 2017 12:09
*To:* Anil Vishnoi <vishnoianil@... <mailto:vishnoianil@...>>
*Cc:* netvirt-dev@... <mailto:netvirt-dev@...>; ovsdb-dev@...
*Subject:* Re: [netvirt-dev] how to address ovsdb node connection flap


Thanks Anil.

Please find my comments in line.

I would vote for option 3 which you suggested.





*From:*Anil Vishnoi [mailto:vishnoianil@...]
*Sent:* Saturday, December 16, 2017 2:31 PM
*To:* K.V Suneelu Verma
*Cc:* ovsdb-dev@... <mailto:ovsdb-dev@...>; netvirt-dev@...
*Subject:* Re: [netvirt-dev] how to address ovsdb node connection flap




On Fri, Dec 15, 2017 at 3:55 AM, K.V Suneelu Verma <k.v.suneelu.verma@...
<mailto:k.v.suneelu.verma@...>> wrote:


   I have created the following jira <>


client connects to only one odl controller via ha proxy

Some times when the client ovsdb connection flap happens, its node goes missing from operational datastore.

The following could be scenarios when connection flap happens

1) client disconnects and connects back to same controller after some delay
2) client disconnects and connects back to same controller immediately
3) client disconnects and connects to another controller after some delay
4) client disconnects and connects to another controller immediately
5) client disconnects and never connects back

When client disconnects all the odl controllers are trying to cleanup the operds node.
When client connects the owner odl controller is trying to create the operds node.

When the processing of one odl controller which is trying to cleanup the operds node is delayed, then we end up client
node missing in oper topology.

​Just to understand it better, i believe you are talking about race condition across the ​cluster node, where once node
disconnect, all the controller attempts to delete the node from data store, but meanwhile switch connects back to the
controllers and the node that is added by that controller get deleted by other controller because it's still processing
the entity ownership notification to cleanup the data store.

[Suneelu] This race is exactly what I am talking about.


To address this issue , the following has been proposed in the review.


When the client connects back to any odl controller before creating the node in oper store , fire delete of the node
from oper store.

​This will generate unnecessary node removed notification to all the consumer applications, even if connection flap
happened for one controller node and that  can cause executing of lot of reconciliation login in the consumer
application (e.f netvirt).​


delay the cleanup of the node in disconnected() callback and in other controllers and do a cleanup only if node
never connected back.

​I believe cleanup only happen when we see that the controller that receive disconnect see that there is only one
manager or the EOS says that respective Entity don't have owner. Timeout across cluster is going to be ticking bomb, it
can cause the same issue in at any time.​



Now the node cleanup responsibility is shifted to the controller wherever the client is connected to.

​You can't take that assumption, because if switch is connected to even one controller and it gets disconnect from it,
all the controller nodes will get following notifications

(1) Latest owner ( wasOwner=true, hasOwner=false, isOwner=false)​



) Second Controller ( wasOwner=false, hasOwner=false, isOwner=false)​


(3) Third Controler ( wasOwner=false, hasOwner=false, isOwner=false)​



​With the assumption that all the controller are up, it's probably easy to determine that the controller that gets
notification with wasOwner=true is the owner and it should delete the node, but you can't take that assumption, because
that won't hold in case that owner controller get killed. So if you will write this logic, other two controller won't
cleanup the data store and even thought switch is not connected to any controller, you will see that node is still
present in the data store.


That controller can predictably delete and recreate the node in oper datastore.

This also ensures that reconciliation gets triggered.

It won't, for the reason i mentioned above.

​ Any approach that you implement should handle these scenarios

* ​Switch connected to one controller

o Switch gets disconnected, but all controllers are up

+ node should be removed from the data store

o Owner controller get killed ​

+ node should be removed from the data store

o ​Switch connects to other controller immidiately,

+ ​Consumer should get notification about node removed and re-added

* ​Switch connected to multiple controller

o ​Switch gets disconnected, but all controllers are up

+ Node should not be removed from data store till switch is connected to one of the controller. If it gets
disconnected from all the node, node should be removed.

o ​Owner controller get killed ​


+ Same as above.

​Now this race condition is happening because existing clustering service can not guarantee the EOS notification
delivery at the same time across the cluster nodes. In my opinion, to address this issue properly for the production
level deployment, we need to implement following 


[Suneelu] Totally agree.

If somehow eos gives a notification with info like this ( wasOwner=false, hasOwner=false, isOwner=false,
original-owner=[timedout|crashed|released]) ​

Then the other controllers can detect that owner controller crashed and can cleanup the node from oper datastore.

Basically the reason for eos change.

I noticed that such a reason field is not present in the newly introduced cluster singleton service also.


[Vishal] Can we request for this, at least in CSS if EOS is going to be deprecated?


(1) Implement connection flap damping mechanism at the library level. A simple exponential decay mechanism that is used
in IP link flapping can be used here. Tricky part here is how you determine that the connection flap is happening for
same OVS, specially when you are behind the NAT system. To make it deterministic you will have to look at the iid that
is coming from that OVS to determine which switch connection is flapping. You can expose various configuration parameter
to the user, so that they can configure it according to their environment.


[Vishal]: You mean Open_vSwittch uuid? We may not be able to do this at library level but at plugin level should be
possible. We can use it to dampen OperDS cleanup, at least. And we already have iid stamped when we push config to
switch. Unless it is case of restart with clean conf.db.


(2) When switch connect to all the controllers, you can still hit the same scenario if connection flap happens, so i
would suggest to let all the controller write the node information to the data store. Given that writing connection
details to the data store is not very frequent operation, cost of writing it 3 times will be negligible per switch. Also
it will keep the solution simple.



(3) Attach a temporary data store listener specifically for the node that controller is writing to the data  store. If
that listener receives a notification of node delete, then you can write the node again to the data store if plugin see
that local node is still connected to the switch. This approach will help you with the scenarios where your switch is
connected to only controller.


I my opinion, (2) + (3) should work for all the scenario that i mentioned above and are more deterministic. But when it
comes to scale, it's good to put a preventive mechanism as well at the lower layer (option 1) so that you can avoid any
unnecessary data writes during warm up time by supressing these connection flap at library level.


[Suneelu] Agree with option 3 , not so sure about option 2 where all the controllers will try to write to the device
(that may bring in more races I feel)





netvirt-dev mailing list
netvirt-dev@... <mailto:netvirt-dev@...>





Join to automatically receive all group messages.