[netvirt-dev] how to address ovsdb node connection flap
Anil Vishnoi
On Fri, Dec 15, 2017 at 3:55 AM, K.V Suneelu Verma <k.v.suneelu.verma@ericsson.
Just to understand it better, i believe you are talking about race condition across the cluster node, where once node disconnect, all the controller attempts to delete the node from data store, but meanwhile switch connects back to the controllers and the node that is added by that controller get deleted by other controller because it's still processing the entity ownership notification to cleanup the data store.
This will generate unnecessary node removed notification to all the consumer applications, even if connection flap happened for one controller node and that can cause executing of lot of reconciliation login in the consumer application (e.f netvirt).
I believe cleanup only happen when we see that the controller that receive disconnect see that there is only one manager or the EOS says that respective Entity don't have owner. Timeout across cluster is going to be ticking bomb, it can cause the same issue in at any time.
You can't take that assumption, because if switch is connected to even one controller and it gets disconnect from it, all the controller nodes will get following notifications (1) Latest owner ( wasOwner=true, hasOwner=false, isOwner=false) (2 ) Second Controller ( wasOwner=false, hasOwner=false, isOwner=false) (3) Third Controler ( wasOwner=false, hasOwner=false, isOwner=false) With the assumption that all the controller are up, it's probably easy to determine that the controller that gets notification with wasOwner=true is the owner and it should delete the node, but you can't take that assumption, because that won't hold in case that owner controller get killed. So if you will write this logic, other two controller won't cleanup the data store and even thought switch is not connected to any controller, you will see that node is still present in the data store.
It won't, for the reason i mentioned above. Any approach that you implement should handle these scenarios
Now this race condition is happening because existing clustering service can not guarantee the EOS notification delivery at the same time across the cluster nodes. In my opinion, to address this issue properly for the production level deployment, we need to implement following (1) Implement connection flap damping mechanism at the library level. A simple exponential decay mechanism that is used in IP link flapping can be used here. Tricky part here is how you determine that the connection flap is happening for same OVS, specially when you are behind the NAT system. To make it deterministic you will have to look at the iid that is coming from that OVS to determine which switch connection is flapping. You can expose various configuration parameter to the user, so that they can configure it according to their environment. (2) When switch connect to all the controllers, you can still hit the same scenario if connection flap happens, so i would suggest to let all the controller write the node information to the data store. Given that writing connection details to the data store is not very frequent operation, cost of writing it 3 times will be negligible per switch. Also it will keep the solution simple. (3) Attach a temporary data store listener specifically for the node that controller is writing to the data store. If that listener receives a notification of node delete, then you can write the node again to the data store if plugin see that local node is still connected to the switch. This approach will help you with the scenarios where your switch is connected to only controller. I my opinion, (2) + (3) should work for all the scenario that i mentioned above and are more deterministic. But when it comes to scale, it's good to put a preventive mechanism as well at the lower layer (option 1) so that you can avoid any unnecessary data writes during warm up time by supressing these connection flap at library level.
Thanks Anil |
|
suneelu
Thanks Anil. Please find my comments in line. I would vote for option 3 which you suggested.
Thanks, Suneelu
From: Anil Vishnoi [mailto:vishnoianil@...]
Sent: Saturday, December 16, 2017 2:31 PM To: K.V Suneelu Verma Cc: ovsdb-dev@...; netvirt-dev@... Subject: Re: [netvirt-dev] how to address ovsdb node connection flap
On Fri, Dec 15, 2017 at 3:55 AM, K.V Suneelu Verma <k.v.suneelu.verma@...> wrote: Hi, I have created the following jira https://jira.opendaylight.org/browse/OVSDB-438 https://git.opendaylight.org/gerrit/#/c/66504/
client connects to only one odl controller via ha proxy Some times when the client ovsdb connection flap happens, its node goes missing from operational datastore. The following could be scenarios when connection flap happens
1) client disconnects and connects back to same controller after some delay
When client disconnects all the odl controllers are trying to cleanup the operds node. When the processing of one odl controller which is trying to cleanup the operds node is delayed, then we end up client node missing in oper topology. Just to understand it better, i believe you are talking about race condition across the cluster node, where once node disconnect, all the controller attempts to delete the node from data store, but meanwhile switch connects back to the controllers and the node that is added by that controller get deleted by other controller because it's still processing the entity ownership notification to cleanup the data store. [Suneelu] This race is exactly what I am talking about.
This will generate unnecessary node removed notification to all the consumer applications, even if connection flap happened for one controller node and that can cause executing of lot of reconciliation login in the consumer application (e.f netvirt).
I believe cleanup only happen when we see that the controller that receive disconnect see that there is only one manager or the EOS says that respective Entity don't have owner. Timeout across cluster is going to be ticking bomb, it can cause the same issue in at any time.
You can't take that assumption, because if switch is connected to even one controller and it gets disconnect from it, all the controller nodes will get following notifications (1) Latest owner ( wasOwner=true, hasOwner=false, isOwner=false)
(2 ) Second Controller ( wasOwner=false, hasOwner=false, isOwner=false)
(3) Third Controler ( wasOwner=false, hasOwner=false, isOwner=false)
With the assumption that all the controller are up, it's probably easy to determine that the controller that gets notification with wasOwner=true is the owner and it should delete the node, but you can't take that assumption, because that won't hold in case that owner controller get killed. So if you will write this logic, other two controller won't cleanup the data store and even thought switch is not connected to any controller, you will see that node is still present in the data store.
It won't, for the reason i mentioned above. Any approach that you implement should handle these scenarios
Now this race condition is happening because existing clustering service can not guarantee the EOS notification delivery at the same time across the cluster nodes. In my opinion, to address this issue properly for the production level deployment, we need to implement following
[Suneelu] Totally agree. If somehow eos gives a notification with info like this ( wasOwner=false, hasOwner=false, isOwner=false, original-owner=[timedout|crashed|released]) Then the other controllers can detect that owner controller crashed and can cleanup the node from oper datastore. Basically the reason for eos change. I noticed that such a reason field is not present in the newly introduced cluster singleton service also.
(1) Implement connection flap damping mechanism at the library level. A simple exponential decay mechanism that is used in IP link flapping can be used here. Tricky part here is how you determine that the connection flap is happening for same OVS, specially when you are behind the NAT system. To make it deterministic you will have to look at the iid that is coming from that OVS to determine which switch connection is flapping. You can expose various configuration parameter to the user, so that they can configure it according to their environment.
(2) When switch connect to all the controllers, you can still hit the same scenario if connection flap happens, so i would suggest to let all the controller write the node information to the data store. Given that writing connection details to the data store is not very frequent operation, cost of writing it 3 times will be negligible per switch. Also it will keep the solution simple.
(3) Attach a temporary data store listener specifically for the node that controller is writing to the data store. If that listener receives a notification of node delete, then you can write the node again to the data store if plugin see that local node is still connected to the switch. This approach will help you with the scenarios where your switch is connected to only controller.
I my opinion, (2) + (3) should work for all the scenario that i mentioned above and are more deterministic. But when it comes to scale, it's good to put a preventive mechanism as well at the lower layer (option 1) so that you can avoid any unnecessary data writes during warm up time by supressing these connection flap at library level.
[Suneelu] Agree with option 3 , not so sure about option 2 where all the controllers will try to write to the device (that may bring in more races I feel)
-- Thanks Anil |
|
Vishal Thapar <vishal.thapar@...>
Some inputs inline.
From: netvirt-dev-bounces@... [mailto:netvirt-dev-bounces@...]
On Behalf Of K.V Suneelu Verma
Sent: 19 December 2017 12:09 To: Anil Vishnoi <vishnoianil@...> Cc: netvirt-dev@...; ovsdb-dev@... Subject: Re: [netvirt-dev] how to address ovsdb node connection flap
Thanks Anil. Please find my comments in line. I would vote for option 3 which you suggested.
Thanks, Suneelu
From: Anil Vishnoi [mailto:vishnoianil@...]
On Fri, Dec 15, 2017 at 3:55 AM, K.V Suneelu Verma <k.v.suneelu.verma@...> wrote: Hi, I have created the following jira https://jira.opendaylight.org/browse/OVSDB-438 https://git.opendaylight.org/gerrit/#/c/66504/
client connects to only one odl controller via ha proxy Some times when the client ovsdb connection flap happens, its node goes missing from operational datastore. The following could be scenarios when connection flap happens
1) client disconnects and connects back to same controller after some delay
When client disconnects all the odl controllers are trying to cleanup the operds node. When the processing of one odl controller which is trying to cleanup the operds node is delayed, then we end up client node missing in oper topology. Just to understand it better, i believe you are talking about race condition across the cluster node, where once node disconnect, all the controller attempts to delete the node from data store, but meanwhile switch connects back to the controllers and the node that is added by that controller get deleted by other controller because it's still processing the entity ownership notification to cleanup the data store. [Suneelu] This race is exactly what I am talking about.
This will generate unnecessary node removed notification to all the consumer applications, even if connection flap happened for one controller node and that can cause executing of lot of reconciliation login in the consumer application (e.f netvirt).
I believe cleanup only happen when we see that the controller that receive disconnect see that there is only one manager or the EOS says that respective Entity don't have owner. Timeout across cluster is going to be ticking bomb, it can cause the same issue in at any time.
You can't take that assumption, because if switch is connected to even one controller and it gets disconnect from it, all the controller nodes will get following notifications (1) Latest owner ( wasOwner=true, hasOwner=false, isOwner=false)
(2 ) Second Controller ( wasOwner=false, hasOwner=false, isOwner=false)
(3) Third Controler ( wasOwner=false, hasOwner=false, isOwner=false)
With the assumption that all the controller are up, it's probably easy to determine that the controller that gets notification with wasOwner=true is the owner and it should delete the node, but you can't take that assumption, because that won't hold in case that owner controller get killed. So if you will write this logic, other two controller won't cleanup the data store and even thought switch is not connected to any controller, you will see that node is still present in the data store.
It won't, for the reason i mentioned above. Any approach that you implement should handle these scenarios
Now this race condition is happening because existing clustering service can not guarantee the EOS notification delivery at the same time across the cluster nodes. In my opinion, to address this issue properly for the production level deployment, we need to implement following
[Suneelu] Totally agree. If somehow eos gives a notification with info like this ( wasOwner=false, hasOwner=false, isOwner=false, original-owner=[timedout|crashed|released]) Then the other controllers can detect that owner controller crashed and can cleanup the node from oper datastore. Basically the reason for eos change. I noticed that such a reason field is not present in the newly introduced cluster singleton service also.
[Vishal] Can we request for this, at least in CSS if EOS is going to be deprecated?
(1) Implement connection flap damping mechanism at the library level. A simple exponential decay mechanism that is used in IP link flapping can be used here. Tricky part here is how you determine that the connection flap is happening for same OVS, specially when you are behind the NAT system. To make it deterministic you will have to look at the iid that is coming from that OVS to determine which switch connection is flapping. You can expose various configuration parameter to the user, so that they can configure it according to their environment.
[Vishal]: You mean Open_vSwittch uuid? We may not be able to do this at library level but at plugin level should be possible. We can use it to dampen OperDS cleanup, at least. And we already have iid stamped when we push config to switch. Unless it is case of restart with clean conf.db.
(2) When switch connect to all the controllers, you can still hit the same scenario if connection flap happens, so i would suggest to let all the controller write the node information to the data store. Given that writing connection details to the data store is not very frequent operation, cost of writing it 3 times will be negligible per switch. Also it will keep the solution simple.
(3) Attach a temporary data store listener specifically for the node that controller is writing to the data store. If that listener receives a notification of node delete, then you can write the node again to the data store if plugin see that local node is still connected to the switch. This approach will help you with the scenarios where your switch is connected to only controller.
I my opinion, (2) + (3) should work for all the scenario that i mentioned above and are more deterministic. But when it comes to scale, it's good to put a preventive mechanism as well at the lower layer (option 1) so that you can avoid any unnecessary data writes during warm up time by supressing these connection flap at library level.
[Suneelu] Agree with option 3 , not so sure about option 2 where all the controllers will try to write to the device (that may bring in more races I feel)
-- Thanks Anil |
|
Vishal Thapar <vishal.thapar@...>
Replaced netvirt-dev with genius-dev.
Hi all,
Reviving this discussion coz we’re hitting similar issue in Genius 3 node CSIT now, it is not very consistent but we occasionally run into it. We got some workarounds for it and some improvements needed in Genius 3 node CSIT itself, but issue is still there in OVSDB.
Can we conclude on what to do? Suneelu’s patch for HWVTEP is already in, is same design [Ibelieve Anil’s suggestion of Option 3] good enough for OVSDB?
Jamo, Do we have cluster CSIT for OVSDB? Probably good to add test case to reproduce this issue and test fix against it?
Regards, Vishal.
From: ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...]
On Behalf Of Vishal Thapar
Sent: 19 December 2017 12:43 To: K.V Suneelu Verma <k.v.suneelu.verma@...>; Anil Vishnoi <vishnoianil@...> Cc: netvirt-dev@...; ovsdb-dev@... Subject: Re: [ovsdb-dev] [netvirt-dev] how to address ovsdb node connection flap
Some inputs inline.
From:
netvirt-dev-bounces@... [mailto:netvirt-dev-bounces@...]
On Behalf Of K.V Suneelu Verma
Thanks Anil. Please find my comments in line. I would vote for option 3 which you suggested.
Thanks, Suneelu
From: Anil Vishnoi [mailto:vishnoianil@...]
On Fri, Dec 15, 2017 at 3:55 AM, K.V Suneelu Verma <k.v.suneelu.verma@...> wrote: Hi, I have created the following jira https://jira.opendaylight.org/browse/OVSDB-438 https://git.opendaylight.org/gerrit/#/c/66504/
client connects to only one odl controller via ha proxy Some times when the client ovsdb connection flap happens, its node goes missing from operational datastore. The following could be scenarios when connection flap happens
1) client disconnects and connects back to same controller after some delay
When client disconnects all the odl controllers are trying to cleanup the operds node. When the processing of one odl controller which is trying to cleanup the operds node is delayed, then we end up client node missing in oper topology. Just to understand it better, i believe you are talking about race condition across the cluster node, where once node disconnect, all the controller attempts to delete the node from data store, but meanwhile switch connects back to the controllers and the node that is added by that controller get deleted by other controller because it's still processing the entity ownership notification to cleanup the data store. [Suneelu] This race is exactly what I am talking about.
This will generate unnecessary node removed notification to all the consumer applications, even if connection flap happened for one controller node and that can cause executing of lot of reconciliation login in the consumer application (e.f netvirt).
I believe cleanup only happen when we see that the controller that receive disconnect see that there is only one manager or the EOS says that respective Entity don't have owner. Timeout across cluster is going to be ticking bomb, it can cause the same issue in at any time.
You can't take that assumption, because if switch is connected to even one controller and it gets disconnect from it, all the controller nodes will get following notifications (1) Latest owner ( wasOwner=true, hasOwner=false, isOwner=false)
(2 ) Second Controller ( wasOwner=false, hasOwner=false, isOwner=false)
(3) Third Controler ( wasOwner=false, hasOwner=false, isOwner=false)
With the assumption that all the controller are up, it's probably easy to determine that the controller that gets notification with wasOwner=true is the owner and it should delete the node, but you can't take that assumption, because that won't hold in case that owner controller get killed. So if you will write this logic, other two controller won't cleanup the data store and even thought switch is not connected to any controller, you will see that node is still present in the data store.
It won't, for the reason i mentioned above. Any approach that you implement should handle these scenarios
Now this race condition is happening because existing clustering service can not guarantee the EOS notification delivery at the same time across the cluster nodes. In my opinion, to address this issue properly for the production level deployment, we need to implement following
[Suneelu] Totally agree. If somehow eos gives a notification with info like this ( wasOwner=false, hasOwner=false, isOwner=false, original-owner=[timedout|crashed|released]) Then the other controllers can detect that owner controller crashed and can cleanup the node from oper datastore. Basically the reason for eos change. I noticed that such a reason field is not present in the newly introduced cluster singleton service also.
[Vishal] Can we request for this, at least in CSS if EOS is going to be deprecated?
(1) Implement connection flap damping mechanism at the library level. A simple exponential decay mechanism that is used in IP link flapping can be used here. Tricky part here is how you determine that the connection flap is happening for same OVS, specially when you are behind the NAT system. To make it deterministic you will have to look at the iid that is coming from that OVS to determine which switch connection is flapping. You can expose various configuration parameter to the user, so that they can configure it according to their environment.
[Vishal]: You mean Open_vSwittch uuid? We may not be able to do this at library level but at plugin level should be possible. We can use it to dampen OperDS cleanup, at least. And we already have iid stamped when we push config to switch. Unless it is case of restart with clean conf.db.
(2) When switch connect to all the controllers, you can still hit the same scenario if connection flap happens, so i would suggest to let all the controller write the node information to the data store. Given that writing connection details to the data store is not very frequent operation, cost of writing it 3 times will be negligible per switch. Also it will keep the solution simple.
(3) Attach a temporary data store listener specifically for the node that controller is writing to the data store. If that listener receives a notification of node delete, then you can write the node again to the data store if plugin see that local node is still connected to the switch. This approach will help you with the scenarios where your switch is connected to only controller.
I my opinion, (2) + (3) should work for all the scenario that i mentioned above and are more deterministic. But when it comes to scale, it's good to put a preventive mechanism as well at the lower layer (option 1) so that you can avoid any unnecessary data writes during warm up time by supressing these connection flap at library level.
[Suneelu] Agree with option 3 , not so sure about option 2 where all the controllers will try to write to the device (that may bring in more races I feel)
-- Thanks Anil |
|
Jamo Luhrsen <jluhrsen@...>
yeah, we have 3node ovsdb CSIT we can enhance to cover this.
toggle quoted message
Show quoted text
would it be as simple as this: - 3 node ODL cluster - create a single ovsdb connection to one of the ODL nodes - check operational store for node - disconnect and reconnect - check operational store that the node is still there I can repeat it in CSIT once for the leader and again on a follower. JamO On 2/19/18 11:54 PM, Vishal Thapar wrote:
Replaced netvirt-dev with genius-dev. |
|