Re: [netvirt-dev] how to address ovsdb node connection flap

Vishal Thapar <vishal.thapar@...>

Some inputs inline.


From: netvirt-dev-bounces@... [mailto:netvirt-dev-bounces@...] On Behalf Of K.V Suneelu Verma
Sent: 19 December 2017 12:09
To: Anil Vishnoi <vishnoianil@...>
Cc: netvirt-dev@...; ovsdb-dev@...
Subject: Re: [netvirt-dev] how to address ovsdb node connection flap


Thanks Anil.

Please find my comments in line.

I would vote for option 3 which you suggested.





From: Anil Vishnoi [mailto:vishnoianil@...]
Sent: Saturday, December 16, 2017 2:31 PM
To: K.V Suneelu Verma
Cc: ovsdb-dev@...; netvirt-dev@...
Subject: Re: [netvirt-dev] how to address ovsdb node connection flap




On Fri, Dec 15, 2017 at 3:55 AM, K.V Suneelu Verma <k.v.suneelu.verma@...> wrote:


   I have created the following jira


client connects to only one odl controller via ha proxy

Some times when the client ovsdb connection flap happens, its node goes missing from operational datastore.

The following could be scenarios when connection flap happens

1) client disconnects and connects back to same controller after some delay
2) client disconnects and connects back to same controller immediately
3) client disconnects and connects to another controller after some delay
4) client disconnects and connects to another controller immediately
5) client disconnects and never connects back

When client disconnects all the odl controllers are trying to cleanup the operds node.
When client connects the owner odl controller is trying to create the operds node.

When the processing of one odl controller which is trying to cleanup the operds node is delayed, then we end up client node missing in oper topology.

​Just to understand it better, i believe you are talking about race condition across the ​cluster node, where once node disconnect, all the controller attempts to delete the node from data store, but meanwhile switch connects back to the controllers and the node that is added by that controller get deleted by other controller because it's still processing the entity ownership notification to cleanup the data store.

[Suneelu] This race is exactly what I am talking about.


To address this issue , the following has been proposed in the review.


When the client connects back to any odl controller before creating the node in oper store , fire delete of the node from oper store.

​This will generate unnecessary node removed notification to all the consumer applications, even if connection flap happened for one controller node and that  can cause executing of lot of reconciliation login in the consumer application (e.f netvirt).​


delay the cleanup of the node in disconnected() callback and in other controllers and do a cleanup only if node never connected back.

​I believe cleanup only happen when we see that the controller that receive disconnect see that there is only one manager or the EOS says that respective Entity don't have owner. Timeout across cluster is going to be ticking bomb, it can cause the same issue in at any time.​



Now the node cleanup responsibility is shifted to the controller wherever the client is connected to.

​You can't take that assumption, because if switch is connected to even one controller and it gets disconnect from it, all the controller nodes will get following notifications

(1) Latest owner ( wasOwner=true, hasOwner=false, isOwner=false)​



) Second Controller ( wasOwner=false, hasOwner=false, isOwner=false)​


(3) Third Controler ( wasOwner=false, hasOwner=false, isOwner=false)​



​With the assumption that all the controller are up, it's probably easy to determine that the controller that gets notification with wasOwner=true is the owner and it should delete the node, but you can't take that assumption, because that won't hold in case that owner controller get killed. So if you will write this logic, other two controller won't cleanup the data store and even thought switch is not connected to any controller, you will see that node is still present in the data store.


That controller can predictably delete and recreate the node in oper datastore.

This also ensures that reconciliation gets triggered.

It won't, for the reason i mentioned above.

​ Any approach that you implement should handle these scenarios

  • ​Switch connected to one controller
    • Switch gets disconnected, but all controllers are up
      • node should be removed from the data store
    • Owner controller get killed ​
      • node should be removed from the data store
    • ​Switch connects to other controller immidiately,
      • ​Consumer should get notification about node removed and re-added
  • ​Switch connected to multiple controller
    • ​Switch gets disconnected, but all controllers are up
      • Node should not be removed from data store till switch is connected to one of the controller. If it gets disconnected from all the node, node should be removed.
    • ​Owner controller get killed ​


      • Same as above.

​Now this race condition is happening because existing clustering service can not guarantee the EOS notification delivery at the same time across the cluster nodes. In my opinion, to address this issue properly for the production level deployment, we need to implement following 


[Suneelu] Totally agree.

If somehow eos gives a notification with info like this ( wasOwner=false, hasOwner=false, isOwner=false, original-owner=[timedout|crashed|released]) ​

Then the other controllers can detect that owner controller crashed and can cleanup the node from oper datastore.

Basically the reason for eos change.

I noticed that such a reason field is not present in the newly introduced cluster singleton service also.


[Vishal] Can we request for this, at least in CSS if EOS is going to be deprecated?


(1) Implement connection flap damping mechanism at the library level. A simple exponential decay mechanism that is used in IP link flapping can be used here. Tricky part here is how you determine that the connection flap is happening for same OVS, specially when you are behind the NAT system. To make it deterministic you will have to look at the iid that is coming from that OVS to determine which switch connection is flapping. You can expose various configuration parameter to the user, so that they can configure it according to their environment.


[Vishal]: You mean Open_vSwittch uuid? We may not be able to do this at library level but at plugin level should be possible. We can use it to dampen OperDS cleanup, at least. And we already have iid stamped when we push config to switch. Unless it is case of restart with clean conf.db.


(2) When switch connect to all the controllers, you can still hit the same scenario if connection flap happens, so i would suggest to let all the controller write the node information to the data store. Given that writing connection details to the data store is not very frequent operation, cost of writing it 3 times will be negligible per switch. Also it will keep the solution simple.



(3) Attach a temporary data store listener specifically for the node that controller is writing to the data  store. If that listener receives a notification of node delete, then you can write the node again to the data store if plugin see that local node is still connected to the switch. This approach will help you with the scenarios where your switch is connected to only controller.


I my opinion, (2) + (3) should work for all the scenario that i mentioned above and are more deterministic. But when it comes to scale, it's good to put a preventive mechanism as well at the lower layer (option 1) so that you can avoid any unnecessary data writes during warm up time by supressing these connection flap at library level.


[Suneelu] Agree with option 3 , not so sure about option 2 where all the controllers will try to write to the device (that may bring in more races I feel)





netvirt-dev mailing list





Join to automatically receive all group messages.