Bug(s) in StalePassiveConnectionService
Josh Hershberg <jhershbe@...>
Slightly newer version of the text: Analysis of event handling in StalePassiveConnectionService Global structures:clientFutureMap (not technically global, but passed around): a map of old client ->SettableFuture pendingConnectionClients: Maps a new client -> clientFutureMap (where all the old clients are to that same OVS) handleNewPassiveConnectionpendingConnectionClients.put(newClient, new clientFutureMap) For each old client Add a future to the clientFutureMap Start a ping Start a 10 second timer (echoTimeoutFuture) Note: this seems to be a timeout to determine that the echo pings have failed. Why set it to 10 seconds when the rpc client’s timeout (whose future we listen to) is so much smaller? Also, why have this at all, why not just rely on the rpc client’s timeout? Events:Ping timeoutCancellation exception is logged createStaleConnectionFutureCallback’s onFailure Ping responsecreateStaleConnectionFutureCallback’s onSuccess: Remove client from clientFutureMap If clientFutureMap is empty, kill the 10 second timer Notify of new client Remove new client from pendingConnectionClients Note: this seems to be a bug. In this case, wouldn’t we want to continue with the old connections and ignore the new? Perhaps not. Perhaps we want to use the newest valid connection and the purpose of this is to get rid of the old, lingering connections by causing them to IOException (see the commit message and comments in the code). Perhaps this was the original thinking but it still does not make so much sense to me. Ping failurePresumably, the same as a timeout. However, the comment in the code (and original commit message) might indicate that the author might have thought something else, like that onSuccess would be called. Hard to understand how that could be the case. New connection closedGet the clientFutureMap in pendingConnectionClients for this client (why is this done in a loop?) Call future.set(null) on all the futures for this canceled client..this should result in a call to “Ping Response” above.???? Why? Why should we not just remove it from the data structure? 10 second timer firesCancel the clientFutures for this new client in its clientFutureMap. Huh!?!? Wouldn’t we want to also notify about the new connection? Presumably if the 10 second timer fires (it’s called the clientEchoTimeout) wouldn’t we assume those clients are dead?On Wed, Apr 25, 2018 at 11:42 AM, Josh Hershberg <jhershbe@...> wrote:
|
|
Josh Hershberg <jhershbe@...>
All, This bug [0] has me looking at StalePassiveConnectionService and aside from this specific bug, it seems there may be other issues. I took some notes about how this class seems to work and the questions are in bold. I have some fixes in mind (and a patch with a rough show of one of them that really needs to be reworked). I would REALLY appreciate at least another set of eyes on this. Please see my analysis below. -Josh Analysis of event handling in StalePassiveConnectionService Global structures:clientFutureMap (not technically global, but passed around): a map of old client ->SettableFuture pendingConnectionClients: Maps a new client -> clientFutureMap (where all the old clients are to that same OVS) handleNewPassiveConnectionpendingConnectionClients.put(newClient, new clientFutureMap) For each old client Add a future to the clientFutureMap Start a ping Start a 10 second timer (echoTimeoutFuture) Note: this seems to be a timeout to determine that the echo pings have failed. Why set it to 10 seconds when the rpc client’s timeout (whose future we listen to) is so much smaller? Also, why have this at all, why not just rely on the rpc client’s timeout? Events:Ping timeoutCancellation exception is logged createStaleConnectionFutureCallback’s onFailure Ping responsecreateStaleConnectionFutureCallback’s onSuccess: Remove client from clientFutureMap If clientFutureMap is empty, kill the 10 second timer Notify of new client Remove new client from pendingConnectionClients Note: this seems to be a bug. In this case, wouldn’t we want to continue with the old connections and ignore the new? Perhaps not. Perhaps we want to use the newest valid connection and the purpose of this is to get rid of the old, lingering connections by causing them to IOException (see the commit message and comments in the code). Perhaps this was the original thinking but it still does not make so much sense to me. Ping failurePresumably, the same as a timeout. However, the comment in the code (and original commit message) might indicate that the author might have thought something else, like that onSuccess would be called. Hard to understand how that could be the case. New connection closedGet the clientFutureMap in pendingConnectionClients for this client (why is this done in a loop?) Call future.set(null) on all the futures..this should result in a call to “Ping Response” above.???? Huh??? Shouldn’t this be a cancel instead of a set(null)?? Shouldn’t we continue with the old connections in this case? This looks to me like it will literally trigger a notification of the new client...which disconnected! 10 second timer firesCancel the clientFutures for this new client in its clientFutureMap. Huh!?!? Wouldn’t we want to also notify about the new connection? Presumably if the 10 second timer fires (it’s called the clientEchoTimeout) wouldn’t we assume those clients are dead? |
|