Flow on switch cannot be obtained by the operational REST API


John
 

Hi,

We've found a possible bug when using OpenDaylight in the following configuration:
- OpenDaylight version 0.5.3
- We have a 4 OpenDaylight configured into a single cluster, within a single L2 network.
- We have roughly 13 switches connected to the controller cluster at that time, all configured with 4 controller IPs. The OpenDaylight cluster then set one of the controller with role=master, and others with role=slave.
- The switches are not within the same L2 network.

What we've observed is that when we use the Operational REST API to obtain flows on the switch, the flow list returned may be missing some flows. Here are the details of the tests:

Part 1: delete flow
1. We checked the switch's flows on the switch for the test flow we used for these tests, cookie=55688,priority=2,in_port=4,actions=drop, and it was there.
2. We then checked the switch's flows through the operational REST API of the controller, and found that the flow list returend was missing this flow.
3. We then used the operational REST API to delete this flow, with delete_strict and matches of cookie=55688,priority=2,in_port=4.
4. We then checked the switch's flows both on the switch and through the controller's operational REST API, and that flow was not there for both cases.
5. During this test we've also recorded the packets at the interface of the controller, filtered by switch IP. The packets file showed that there is one flow_mod message, and multiple mutlipart_reply flow messages before and after the flow_mod. The flows in these reply messages are correct, with the flow in question present before the flow_mod, and missing after the flow_mod.

the packet file for part 1 can be downloaded here:
https://drive.google.com/file/d/0B5MHAcG7UuTrMzZoU1RybjcyT2M/view?usp=sharing

Part 2: add flow
1. We checked the switch's flows both on the switch and through the controller's operational REST API, and the flow in question was not there for both cases.
2. We then used the operational REST API to add this flow two times, set a few minutes apart.
3. We then checked for this flow on the switch and through the operational REST API, and we only found this flow on the switch, not in the list returned by the operational REST API.
4. Again we recorded the packets between the controller and the switch, this time at the interface of the switch's management port. The packet filed showed that there are two flow_mod messages, adding the same flow. The multiple multipart_reply flow messages again showed that the flow list returned by the switch are correct, without the flow before the flow_mods, and with the flow after.

The packet file for part 2 can be downloaded here:
https://drive.google.com/file/d/0B5MHAcG7UuTrbGw0Q1QyblR4dk0/view?usp=sharing

Some more details of the tests:
1. We did check multiple times the flows returned by the operational REST API before and after these tests, and they didn't change even after a few hours.
2. The number of multipart_request for flows doesn't match to the number of times we've actually asked the operational REST API for flow. We've asked many more times than the packets showed.
3. We've only did the packet tests to one of the switches, but we did see flow mismatch between the operational REST API and the flows we've added through the controller through the config and operational REST API on some (not all) of the other switches.
4. We cannot recreate this behavior consistently. The master controller shown on each of the switches are mostly different, and they do swap at time to time, but it is possible that this has only happened at one of the OpenDaylight controller in the cluster, and hasn't happened yet after we restarted that OpenDaylight process.
5. We did not see any obvious error logs in the OpenDaylight logs.
6. The test flow we've used, cookie=55688,priority=2,in_port=4,actions=drop, were only added and deleted from the switch through the controller's operational REST API. It was never added to the config of OpenDaylight.
7. The operational REST API URLs for request and mod we've used are:
/restconf/operational/opendaylight-inventory:nodes/node/openflow:1/table/0
/restconf/operations/sal-flow:add-flow

My question is, have you ever encountered this problem? If not, are there any data you would like us to collect if this happens again, that will help with pinpointing this problem?

--
John Ai 艾鶴強


Jamo Luhrsen <jluhrsen@...>
 

+openflowplugin-dev list as I think it's more widely read.

On 10/06/2017 02:05 AM, John Ai wrote:
Hi,

We've found a possible bug when using OpenDaylight in the following configuration:
- OpenDaylight version 0.5.3
- We have a 4 OpenDaylight configured into a single cluster, within a single L2 network.
- We have roughly 13 switches connected to the controller cluster at that time, all configured with 4 controller IPs. The
OpenDaylight cluster then set one of the controller with role=master, and others with role=slave.
- The switches are not within the same L2 network.

What we've observed is that when we use the Operational REST API to obtain flows on the switch, the flow list returned may be
missing some flows. Here are the details of the tests:

Part 1: delete flow
1. We checked the switch's flows on the switch for the test flow we used for these tests,
cookie=55688,priority=2,in_port=4,actions=drop, and it was there.
2. We then checked the switch's flows through the operational REST API of the controller, and found that the flow list
returend was missing this flow.
3. We then used the operational REST API to delete this flow, with delete_strict and matches of
cookie=55688,priority=2,in_port=4.
4. We then checked the switch's flows both on the switch and through the controller's operational REST API, and that flow was
not there for both cases.
5. During this test we've also recorded the packets at the interface of the controller, filtered by switch IP. The packets
file showed that there is one flow_mod message, and multiple mutlipart_reply flow messages before and after the flow_mod. The
flows in these reply messages are correct, with the flow in question present before the flow_mod, and missing after the flow_mod.

the packet file for part 1 can be downloaded here:
https://drive.google.com/file/d/0B5MHAcG7UuTrMzZoU1RybjcyT2M/view?usp=sharing

Part 2: add flow
1. We checked the switch's flows both on the switch and through the controller's operational REST API, and the flow in
question was not there for both cases.
2. We then used the operational REST API to add this flow two times, set a few minutes apart.
3. We then checked for this flow on the switch and through the operational REST API, and we only found this flow on the
switch, not in the list returned by the operational REST API.
4. Again we recorded the packets between the controller and the switch, this time at the interface of the switch's management
port. The packet filed showed that there are two flow_mod messages, adding the same flow. The multiple multipart_reply flow
messages again showed that the flow list returned by the switch are correct, without the flow before the flow_mods, and with
the flow after.

The packet file for part 2 can be downloaded here:
https://drive.google.com/file/d/0B5MHAcG7UuTrbGw0Q1QyblR4dk0/view?usp=sharing

Some more details of the tests:
1. We did check multiple times the flows returned by the operational REST API before and after these tests, and they didn't
change even after a few hours.
2. The number of multipart_request for flows doesn't match to the number of times we've actually asked the operational REST
API for flow. We've asked many more times than the packets showed.
3. We've only did the packet tests to one of the switches, but we did see flow mismatch between the operational REST API and
the flows we've added through the controller through the config and operational REST API on some (not all) of the other switches.
4. We cannot recreate this behavior consistently. The master controller shown on each of the switches are mostly different,
and they do swap at time to time, but it is possible that this has only happened at one of the OpenDaylight controller in the
cluster, and hasn't happened yet after we restarted that OpenDaylight process.
5. We did not see any obvious error logs in the OpenDaylight logs.
6. The test flow we've used, cookie=55688,priority=2,in_port=4,actions=drop, were only added and deleted from the switch
through the controller's operational REST API. It was never added to the config of OpenDaylight.
7. The operational REST API URLs for request and mod we've used are:
/restconf/operational/opendaylight-inventory:nodes/node/openflow:1/table/0
/restconf/operations/sal-flow:add-flow

My question is, have you ever encountered this problem? If not, are there any data you would like us to collect if this
happens again, that will help with pinpointing this problem?

--
John Ai 艾鶴強


_______________________________________________
openflowplugin-users mailing list
openflowplugin-users@...
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-users