This group is locked. No changes can be made to the group while it is locked.
Date
1 - 6 of 6
Cannot connect to database server Exception - Sporadic Error
Devin Avery <davery@...>
Hello AAA Devs - I am seeing a sporadic error in my karaf log files stating that the database file is locked. This results in an authentication failure which unfortunetly prevents the controller from mounting itself as a netconf controller. Any hints or pointers that I could look at to try and understand what might be happening in these cases? Ubuntu 14.04 Brand new controller, fresh controller start up (features are all getting installed via features.boot) 2015-03-05 05:45:05,694 | INFO | pool-8-thread-1 | GrantStore | 232 - org.opendaylight.aaa.idmlight - 0.1.2.Helium-SR2 | grants Table does not exist, creating table 2015-03-05 05:45:06,588 | ERROR | Thread-42 | StoreException | 232 - org.opendaylight.aaa.idmlight - 0.1.2.Helium-SR2 | Cannot connect to database server java.sql.SQLException: [SQLITE_BUSY] The database file is locked (database is locked) 2015-03-05 05:45:06,589 | INFO | 1]-nio2-thread-6 | ClientUserAuthServiceNew | 139 - org.apache.sshd.core - 0.12.0 | Received SSH_MSG_USERAUTH_FAILURE 2015-03-05 05:45:06,589 | WARN | 1]-nio2-thread-6 | AsyncSshHandler | 137 - org.opendaylight.controller.netconf-netty-util - 0.2.7.Helium-SR2 | Unable to setup SSH connection on channel: [id: 0x349bae66] 2015-03-05 05:45:06,595 | DEBUG | 1]-nio2-thread-6 | AsyncSshHandler | 137 - org.opendaylight.controller.netconf-netty-util - 0.2.7.Helium-SR2 | SSH session closed on channel: [id: 0x349bae66] 2015-03-05 05:45:06,595 | WARN | 1]-nio2-thread-6 | DefaultAuthFuture | 139 - org.apache.sshd.core - 0.12.0 | Listener threw an exception java.lang.NullPointerException: cause at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:475)[121:io.netty.common:4.0.23.Final] at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:414)[121:io.netty.common:4.0.23.Final] at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:87)[120:io.netty.transport:4.0.23.Final] at org.opendaylight.controller.netconf.nettyutil.handler.ssh.client.AsyncSshHandler.handleSshSetupFailure(AsyncSshHandler.java:154)[137:org.opendaylight.controller.netconf-netty-util:0.2.7.Helium-SR2] at org.opendaylight.controller.netconf.nettyutil.handler.ssh.client.AsyncSshHandler.access$100(AsyncSshHandler.java:32)[137:org.opendaylight.controller.netconf-netty-util:0.2.7.Helium-SR2] at org.opendaylight.controller.netconf.nettyutil.handler.ssh.client.AsyncSshHandler$2.operationComplete(AsyncSshHandler.java:103)[137:org.opendaylight.controller.netconf-netty-util:0.2.7.Helium-SR2] at org.opendaylight.controller.netconf.nettyutil.handler.ssh.client.AsyncSshHandler$2.operationComplete(AsyncSshHandler.java:97)[137:org.opendaylight.controller.netconf-netty-util:0.2.7.Helium-SR2] at org.apache.sshd.common.future.DefaultSshFuture.notifyListener(DefaultSshFuture.java:261)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.future.DefaultSshFuture.notifyListeners(DefaultSshFuture.java:247)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.future.DefaultSshFuture.setValue(DefaultSshFuture.java:175)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.future.DefaultAuthFuture.setAuthed(DefaultAuthFuture.java:68)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientUserAuthServiceNew.tryNext(ClientUserAuthServiceNew.java:206)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientUserAuthServiceNew.processUserAuth(ClientUserAuthServiceNew.java:178)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientUserAuthServiceNew.process(ClientUserAuthServiceNew.java:131)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientUserAuthService.process(ClientUserAuthService.java:80)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.session.AbstractSession.doHandleMessage(AbstractSession.java:399)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.session.AbstractSession.handleMessage(AbstractSession.java:295)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientSessionImpl.handleMessage(ClientSessionImpl.java:256)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:731)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:277)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:54)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:187)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:173)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$1.run(Nio2CompletionHandler.java:32) at java.security.AccessController.doPrivileged(Native Method)[:1.7.0_75] at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:30)[139:org.apache.sshd.core:0.12.0] at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)[:1.7.0_75] at sun.nio.ch.Invoker$2.run(Invoker.java:218)[:1.7.0_75] at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)[:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_75] at java.lang.Thread.run(Thread.java:745)[:1.7.0_75] Thank you, Devin Devin Avery Devin.Avery@...
|
|
Devin Avery <davery@...>
Copying to the controller dev list as well as this issue manifests itself with the controller failing to mount itself as a netconf device. :) Perhaps others have seen it. Devin Devin Avery Devin.Avery@... From: Devin Avery <davery@...> Date: Thursday, March 5, 2015 at 9:56 AM To: "aaa-dev@..." <aaa-dev@...> Subject: [Aaa-dev] Cannot connect to database server Exception - Sporadic Error Hello AAA Devs - I am seeing a sporadic error in my karaf log files stating that the database file is locked. This results in an authentication failure which unfortunetly prevents the controller from mounting itself as a netconf controller. Any hints or pointers that I
could look at to try and understand what might be happening in these cases? Ubuntu 14.04 Brand new controller, fresh controller start up (features are all getting installed via features.boot) 2015-03-05 05:45:05,694 | INFO | pool-8-thread-1 | GrantStore | 232 - org.opendaylight.aaa.idmlight - 0.1.2.Helium-SR2 | grants Table does not exist, creating table 2015-03-05 05:45:06,588 | ERROR | Thread-42 | StoreException | 232 - org.opendaylight.aaa.idmlight - 0.1.2.Helium-SR2 | Cannot connect to database server java.sql.SQLException: [SQLITE_BUSY] The database file is locked (database
is locked) 2015-03-05 05:45:06,589 | INFO | 1]-nio2-thread-6 | ClientUserAuthServiceNew | 139 - org.apache.sshd.core - 0.12.0 | Received SSH_MSG_USERAUTH_FAILURE 2015-03-05 05:45:06,589 | WARN | 1]-nio2-thread-6 | AsyncSshHandler | 137 - org.opendaylight.controller.netconf-netty-util - 0.2.7.Helium-SR2 | Unable to setup SSH connection on channel: [id: 0x349bae66] 2015-03-05 05:45:06,595 | DEBUG | 1]-nio2-thread-6 | AsyncSshHandler | 137 - org.opendaylight.controller.netconf-netty-util - 0.2.7.Helium-SR2 | SSH session closed on channel: [id: 0x349bae66] 2015-03-05 05:45:06,595 | WARN | 1]-nio2-thread-6 | DefaultAuthFuture | 139 - org.apache.sshd.core - 0.12.0 | Listener threw an exception java.lang.NullPointerException: cause at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:475)[121:io.netty.common:4.0.23.Final] at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:414)[121:io.netty.common:4.0.23.Final] at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:87)[120:io.netty.transport:4.0.23.Final] at org.opendaylight.controller.netconf.nettyutil.handler.ssh.client.AsyncSshHandler.handleSshSetupFailure(AsyncSshHandler.java:154)[137:org.opendaylight.controller.netconf-netty-util:0.2.7.Helium-SR2] at org.opendaylight.controller.netconf.nettyutil.handler.ssh.client.AsyncSshHandler.access$100(AsyncSshHandler.java:32)[137:org.opendaylight.controller.netconf-netty-util:0.2.7.Helium-SR2] at org.opendaylight.controller.netconf.nettyutil.handler.ssh.client.AsyncSshHandler$2.operationComplete(AsyncSshHandler.java:103)[137:org.opendaylight.controller.netconf-netty-util:0.2.7.Helium-SR2] at org.opendaylight.controller.netconf.nettyutil.handler.ssh.client.AsyncSshHandler$2.operationComplete(AsyncSshHandler.java:97)[137:org.opendaylight.controller.netconf-netty-util:0.2.7.Helium-SR2] at org.apache.sshd.common.future.DefaultSshFuture.notifyListener(DefaultSshFuture.java:261)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.future.DefaultSshFuture.notifyListeners(DefaultSshFuture.java:247)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.future.DefaultSshFuture.setValue(DefaultSshFuture.java:175)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.future.DefaultAuthFuture.setAuthed(DefaultAuthFuture.java:68)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientUserAuthServiceNew.tryNext(ClientUserAuthServiceNew.java:206)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientUserAuthServiceNew.processUserAuth(ClientUserAuthServiceNew.java:178)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientUserAuthServiceNew.process(ClientUserAuthServiceNew.java:131)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientUserAuthService.process(ClientUserAuthService.java:80)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.session.AbstractSession.doHandleMessage(AbstractSession.java:399)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.session.AbstractSession.handleMessage(AbstractSession.java:295)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.client.session.ClientSessionImpl.handleMessage(ClientSessionImpl.java:256)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.session.AbstractSession.decode(AbstractSession.java:731)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.session.AbstractSession.messageReceived(AbstractSession.java:277)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.AbstractSessionIoHandler.messageReceived(AbstractSessionIoHandler.java:54)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:187)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:173)[139:org.apache.sshd.core:0.12.0] at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$1.run(Nio2CompletionHandler.java:32) at java.security.AccessController.doPrivileged(Native Method)[:1.7.0_75] at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:30)[139:org.apache.sshd.core:0.12.0] at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)[:1.7.0_75] at sun.nio.ch.Invoker$2.run(Invoker.java:218)[:1.7.0_75] at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)[:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_75] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_75] at java.lang.Thread.run(Thread.java:745)[:1.7.0_75] Thank you, Devin
|
|
Wojciech Dec
Hi, I must admit I didn't see this particular problem so far. It obviously looks like something thrown up by SQLite on reading/writing. We're going to replace SQLite with another db backend so hopefully that may redress that, in the meantime if you have more info on how to reproduce this please let us know.
On 5 March 2015 at 16:11, Devin Avery <davery@...> wrote:
|
|
Devin Avery <davery@...>
Hi Folks, Following up on this. So the problem appears to be an internal race condition inside of MD-SAL / AAA / config system combined with logic flaws about retrying connections. Basically, the way I get this to reproduce is Iist all of the features that I want to install in the featuresBoot list and then start a clean controller (I.e. delete data directory, delete /etc/opendaylight). When the controller starts up there is a race condition from the config subsystem injecting the self-referencing netconf connection and the AAA database being locked. When the problem occurs the AAA database file is locked and thus when netconf tries to connect and specifically authenticate the AAA fails because of an exception. Then because the config subsystem, or more specifically the netconf mounter is a try once and done (no retries) it fails to connect the self-referencing controller. Potential fixes from what I can tell:
I think ultimately #1 is the longer term fix. If you are trying to reproduce this note that I see this every once in a while - on 10 machines, I see it consistently on 1 of them every time I run a suite of tests. For now I have worked around this by installing the features more slowly / independently, but this is clearly not an ideal solution. I’ll see if I can’t come up with a more consistent reproduction scenario but figured I would share what I know in the mean time. Pone other comment - in the AAA code it would be really helpful it the entire exception was printed to the log, not just the message. In general the AAA code catches exceptions and logs the e.getMessage() or they use an overloaded toString on the exception to only print a simple message. It is better practice to print the stack trace too though to make it easier to debug :) log.info( “Caught exception “ + e.getMessage(), e ) <— passing the exception as a second argument will print the stack trace to the log for the exception and may provide additional information. Thank you, Devin Devin Avery Devin.Avery@... From: Wojciech Dec <wdec.ietf@...> Date: Friday, March 6, 2015 at 5:20 AM To: Devin Avery <davery@...> Cc: "aaa-dev@..." <aaa-dev@...>, controller-dev <controller-dev@...> Subject: Re: [Aaa-dev] Cannot connect to database server Exception - Sporadic Error Hi, I must admit I didn't see this particular problem so far. It obviously looks like something thrown up by SQLite on reading/writing. We're going to replace SQLite with another db backend so hopefully that may redress that, in the meantime if you have more info on how to reproduce this please let us know.
On 5 March 2015 at 16:11, Devin Avery <davery@...> wrote:
|
|
Wojciech Dec
Hi Devin, thanks for the insights. I captured them in two bugs:https://bugs.opendaylight.org/show_bug.cgi?id=2808 https://bugs.opendaylight.org/show_bug.cgi?id=2809
On 10 March 2015 at 14:19, Devin Avery <davery@...> wrote:
|
|
Devin Avery <davery@...>
Thank you. I’ll update them if I have more information to add. Devin Devin Avery Devin.Avery@... From: Wojciech Dec <wdec.ietf@...> Date: Tuesday, March 10, 2015 at 10:01 AM To: Devin Avery <davery@...> Cc: "aaa-dev@..." <aaa-dev@...>, controller-dev <controller-dev@...> Subject: Re: [Aaa-dev] Cannot connect to database server Exception - Sporadic Error Hi Devin, thanks for the insights. I captured them in two bugs:https://bugs.opendaylight.org/show_bug.cgi?id=2808 https://bugs.opendaylight.org/show_bug.cgi?id=2809
On 10 March 2015 at 14:19, Devin Avery <davery@...> wrote:
|
|