OF connection close during handshake when switch-idle-timeout is low


Michal Rehak -X (mirehak - Pantheon Technologies SRO@Cisco) <mirehak@...>
 

Hi Tali,
unfortunately the idle timeout is provided by netty. If we bypass the idle notification during handshake that would solve your the current issue but we would freeze if device times out during handshake. Good news is that there was already a request for modifying the idle timeout during session lifecycle. However this requires some adaptation in openflowJava project where netty is utilized.


@Michal: could you please provide some up-to-date information regarding changing idle timeout on demand (probably per device)? Thank you.


Regards,
Michal



From: Tali Ben Meir [Tali.BenMeir@...]
Sent: Tuesday, April 14, 2015 16:48
To: Michal Rehak -X (mirehak - Pantheon Technologies SRO at Cisco)
Subject: RE: OF connection close during handshake when switch-idle-timeout is low

Hi,

 

Actually what we are trying to do is just to detect switch failure quickly (using NodeRemoved notification from OpenDaylightInvetoryListener)

Everything is ok as long as the switch is being shutdown gracefully, but when the switch box is shut down forcefully, we need to wait 15 sec. until ECHO request times out to detect the failure.

We just wanted to make the ECHO more frequent so that switch failure detection would not take more than 1 sec.

But when switch-idle-timeout configuration was set to 1000msec – handshake fails and the controller send TCP FIN to the switch.

Can you think of a way to make switch failure detection quicker without influencing the handshake?

 

Tali

 

From: Michal Rehak -X (mirehak - Pantheon Technologies SRO at Cisco) [mailto:mirehak@...]
Sent: Tuesday, April 14, 2015 4:21 PM
To: Tali Ben Meir
Subject: RE: OF connection close during handshake when switch-idle-timeout is low

 

Hi Tali,
well the timeout for idle state is meant to catch the situation when connection times out. You are trying to achieve something different - something like continuous device latency measurement. So any workaround we can think of would be just a workaround because idle timeout is meant to do something else.

So question: how would exposing of echo message via ofPlugin API help you doing your task?

Regards,
Michal


From: Tali Ben Meir [Tali.BenMeir@...]
Sent: Tuesday, April 14, 2015 11:53
To: Michal Rehak -X (mirehak - Pantheon Technologies SRO at Cisco)
Subject: RE: OF connection close during handshake when switch-idle-timeout is low

So is there any nice way to work around this?

-        Different timeouts for handshake phase and working phase?

-        Echo during handshake? (I guess it may be forbidden based on protocol spec or a very error prone process)

-        Other ideas?

 

Tali

 

 

From: Michal Rehak -X (mirehak - Pantheon Technologies SRO at Cisco) [mailto:mirehak@...]
Sent: Tuesday, April 14, 2015 12:20 PM
To: Tali Ben Meir
Subject: RE: OF connection close during handshake when switch-idle-timeout is low

 

Hi Tali,
proposed change would probably cause that timeout based disconnection wont appear during handshake state.

Echo messages are independent and you can have as many echo messages in progress as you want. Idle notification appears only if there is radio silence for more than specified interval (5 seconds). For example if there is heavy traffic then there is no echo message sent out automatically (neither by controller nor switch).

Regards,
Michal


From: Tali Ben Meir [Tali.BenMeir@...]
Sent: Tuesday, April 14, 2015 10:25
To: Michal Rehak -X (mirehak - Pantheon Technologies SRO at Cisco)
Subject: RE: OF connection close during handshake when switch-idle-timeout is low

Hi,

 

First, thanks for the quick reply.

 

So if I understand you correctly, can I add a piece of before

 

                if (!CONDUCTOR_STATE.WORKING.equals(getConductorState())) {

 

Something like

 

               if (CONDUCTOR_STATE.HANDSHAKING.equals(getConductorState())) {

              return;

       }

 

Would that be ok? Can this make some damage someplace else?

 

 

Additional question – I see the timeout interval between ECHO request and response is hardcoded to 2000msec.

Do I necessarily need to make it less than my switch-idle-timeout or can several pending ECHO messages co-exist?

 

Thanks again

Tali

 

From: Michal Rehak -X (mirehak - Pantheon Technologies SRO at Cisco) [mailto:mirehak@...]
Sent: Tuesday, April 14, 2015 11:12 AM
To: Tali Ben Meir
Subject: RE: OF connection close during handshake when switch-idle-timeout is low

 

Hi Tali,

the idle state has 2 phases. First controller is in WORKING state and receives idle notification from netty. Here the state is changed to TIMEOUTING and echo is sent do device. If we do not get echo reply within specified timeout then device is considered offline and actively disconnected. So if there is some load of messages going through it might happen that the expected echo reply get delayed and causes disconnection.

HANDSHAKE is valid state only during initial steps when version is negotiated and basic device features queried. Here hello, error and features-reply messages are accepted. After handshake is done these messages must not occur.

And now the good news is that in new design proposal (currently in heavy development) there is the echo message exposed via API to app. So any app will be able to punt echo message anytime.
ETA is M5 - end of April.


Regards,
Michal


From: Tali Ben Meir [Tali.BenMeir@...]
Sent: Monday, April 13, 2015 22:40
To: Michal Rehak -X (mirehak - Pantheon Technologies SRO at Cisco)
Subject: OF connection close during handshake when switch-idle-timeout is low

Hi Michal,

 

My name is Tali from ConteXtream and I have a question about the ConnectionConductorImpl behavior on SwitchIdleTimeout event.

I’m trying to make the OF heartbeat i.e. ECHO request/response to operate on a very high rate – ECHO request should be sent each 300msec.

I have tried setting the switch-idle-timeout to values like 300msec/1000msec/2000msec but when the OF connection is being established (handshake phase), ODL sometimes terminates the connection towards the switch. It never happens when using the default timeout (15000msec).

I have seen you placed the following protective code in ConnectionConductorImpl:

 

public void onSwitchIdleEvent(SwitchIdleEvent notification) {

        new Thread(new Runnable() {

            @Override

            public void run() {

                if (!CONDUCTOR_STATE.WORKING.equals(getConductorState())) {

                    // idle state in any other conductorState than WORKING means real

                    // problem and wont be handled by echoReply, but disconnection

                    disconnect();

                    OFSessionUtil.getSessionManager().invalidateOnDisconnect(ConnectionConductorImpl.this);

 

I fail to understand why conductor state == HANDSHAKING is an erroneous and will lead to OF session invalidation? Could you explain?

I am using Helium SR1.

 

Thanks in advance

Tali

 

Tali Ben-Meir

SW Engineer

ConteXtream

Email: tali.benmeir@...