[netconf-dev] [release] netconf SFT hanging for over 3 hrs


Ryan Goulding <ryandgoulding@...>
 

It fixes it in a bandaid sort of way... the correct solution is to move to Jersey 2 so that jersey instantiates these objects through java code and not web.xml at runtime.  Then we can do correct bean instantiation and wiring.  There is a separate stopgap I was investigating through using a web.xml Listener but it appears that doesn't work.  This at least gets us over the initial bug, but the long term solution is to use Jersey 2.

Regards,

Ryan Goulding

On Thu, Jul 20, 2017 at 10:29 AM, Thanh Ha <thanh.ha@...> wrote:
At least in autorelease we haven't seen it for a few days.

Regards,
Thanh


On Thu, Jul 20, 2017 at 12:24 AM, Colin Dixon <colin@...> wrote:
Do we know if this fixed the issue?

--Colin


On Thu, Jul 13, 2017 at 1:57 PM, Tom Pantelis <tompantelis@...> wrote:

On Thu, Jul 13, 2017 at 4:32 PM, Tom Pantelis <tompantelis@...> wrote:
So if getInstance is called when INSTANCE is yet set, then it will create INSTANCE with all nulls and thus the shiroConfiguration will forever be null. 

public static AAAShiroProvider getInstance() {
        if (INSTANCE == null) {
            newInstance(null, null, null, null, null,null);
        }
        return INSTANCE;
    }

So to get a valid INSTANCE, it must be created via the aaa-shiro-impl BP which injects a non-null ShiroConfiguration. So it seems KarafIniWebEnvironment is racing with the BP container.  This is a bit fragile - I think getInstance should just return the INSTANCE and not create it and KarafIniWebEnvironment should busy wait on that (with a sleep). Also INSTANCE should be volatile.




On Thu, Jul 13, 2017 at 4:13 PM, Tom Pantelis <tompantelis@...> wrote:
It looks like this is in an endless loop:
 
        while (shiroConfiguration == null) {
            shiroConfiguration = AAAShiroProvider.getInstance().getShiroConfiguration();
        }
   

On Thu, Jul 13, 2017 at 4:09 PM, Tom Pantelis <tompantelis@...> wrote:
The shutdown thread is waiting on lock 0x00000006cac049f0:

"Framework stop" #290 daemon prio=5 os_prio=0 tid=0x00007ff33832d000 nid=0x1b36 waiting for monitor entry [0x00007ff3768b6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.ops4j.pax.web.extender.war.internal.extender.SimpleExtension.destroy(SimpleExtension.java:67)
        - waiting to lock <0x00000006cac049f0> (a org.ops4j.pax.web.extender.war.internal.WebObserver$1)
        at org.ops4j.pax.web.extender.war.internal.extender.AbstractExtender$2.run(AbstractExtender.java:307)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.ops4j.pax.web.extender.war.internal.extender.AbstractExtender.destroyExtension(AbstractExtender.java:333)
        at org.ops4j.pax.web.extender.war.internal.extender.AbstractExtender.bundleChanged(AbstractExtender.java:191)
        at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:902)
        at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:230)
        at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:148)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEventPrivileged(EquinoxEventPublisher.java:165)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:75)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:67)
        at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor.publishModuleEvent(EquinoxContainerAdaptor.java:102)
        at org.eclipse.osgi.container.Module.publishEvent(Module.java:466)
        at org.eclipse.osgi.container.Module.doStop(Module.java:624)
        at org.eclipse.osgi.container.Module.stop(Module.java:488)
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.decStartLevel(ModuleContainer.java:1623)
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.doContainerStartLevel(ModuleContainer.java:1542)
        - locked <0x00000006c7fbe450> (a java.lang.Object)
        at org.eclipse.osgi.container.SystemModule.stopWorker(SystemModule.java:248)
        at org.eclipse.osgi.internal.framework.EquinoxBundle$SystemBundle$EquinoxSystemModule.stopWorker(EquinoxBundle.java:144)
        at org.eclipse.osgi.container.Module.doStop(Module.java:626)
        at org.eclipse.osgi.container.Module.stop(Module.java:488)
        at org.eclipse.osgi.container.SystemModule.stop(SystemModule.java:186)


Which is held by this thread:

"paxweb-extender-2-thread-3" #206 prio=5 os_prio=0 tid=0x00007ff34cf9d000 nid=0x1af2 runnable [0x00007ff32fda6000]
   java.lang.Thread.State: RUNNABLE
        at org.opendaylight.aaa.shiro.web.env.KarafIniWebEnvironment.init(KarafIniWebEnvironment.java:72)
        at org.apache.shiro.util.LifecycleUtils.init(LifecycleUtils.java:45)
        at org.apache.shiro.util.LifecycleUtils.init(LifecycleUtils.java:40)
        at org.apache.shiro.web.env.EnvironmentLoader.createEnvironment(EnvironmentLoader.java:221)
        at org.apache.shiro.web.env.EnvironmentLoader.initEnvironment(EnvironmentLoader.java:133)
        at org.apache.shiro.web.env.EnvironmentLoaderListener.contextInitialized(EnvironmentLoaderListener.java:58)
        at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext$3.call(HttpServiceContext.java:354)
        at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext$3.call(HttpServiceContext.java:350)
        at org.ops4j.pax.swissbox.core.ContextClassLoaderUtils.doWithClassLoader(ContextClassLoaderUtils.java:60)
        at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.callContextInitialized(HttpServiceContext.java:349)
        at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:791)
        at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:294)
        at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.startContext(HttpServiceContext.java:601)
        at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741)
        at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doStart(HttpServiceContext.java:260)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
        - locked <0x00000006ca9e43d8> (a java.lang.Object)
        at org.ops4j.pax.web.service.jetty.internal.JettyServerImpl$1.start(JettyServerImpl.java:279)
        at org.ops4j.pax.web.service.internal.HttpServiceStarted.end(HttpServiceStarted.java:1096)
        at org.ops4j.pax.web.service.internal.HttpServiceProxy.end(HttpServiceProxy.java:417)
        at org.ops4j.pax.web.extender.war.internal.RegisterWebAppVisitorWC.end(RegisterWebAppVisitorWC.java:380)
        at org.ops4j.pax.web.extender.war.internal.model.WebApp.accept(WebApp.java:692)
        at org.ops4j.pax.web.extender.war.internal.WebAppPublisher$WebAppDependencyListener.register(WebAppPublisher.java:237)
        at org.ops4j.pax.web.extender.war.internal.WebAppPublisher$WebAppDependencyListener.addingService(WebAppPublisher.java:182)
        at org.ops4j.pax.web.extender.war.internal.WebAppPublisher$WebAppDependencyListener.addingService(WebAppPublisher.java:135)
        at org.osgi.util.tracker.ServiceTracker$Tracked.customizerAdding(ServiceTracker.java:941)
        at org.osgi.util.tracker.ServiceTracker$Tracked.customizerAdding(ServiceTracker.java:870)
        at org.osgi.util.tracker.AbstractTracked.trackAdding(AbstractTracked.java:256)
        at org.osgi.util.tracker.AbstractTracked.trackInitial(AbstractTracked.java:183)
        at org.osgi.util.tracker.ServiceTracker.open(ServiceTracker.java:318)
        at org.osgi.util.tracker.ServiceTracker.open(ServiceTracker.java:261)
        at org.ops4j.pax.web.extender.war.internal.WebAppPublisher.publish(WebAppPublisher.java:101)
        at org.ops4j.pax.web.extender.war.internal.WebObserver.deploy(WebObserver.java:219)
        at org.ops4j.pax.web.extender.war.internal.WebObserver$1.doStart(WebObserver.java:179)
        at org.ops4j.pax.web.extender.war.internal.extender.SimpleExtension.start(SimpleExtension.java:59)
        - locked <0x00000006cac049f0> (a org.ops4j.pax.web.extender.war.internal.WebObserver$1)
        at org.ops4j.pax.web.extender.war.internal.extender.AbstractExtender$1.run(AbstractExtender.java:281)


So it  looks like org.opendaylight.aaa.shiro.web.env.KarafIniWebEnvironment.init is stuck/hung.

On Thu, Jul 13, 2017 at 3:52 PM, Thanh Ha <thanh.ha@...> wrote:
On Thu, Jul 13, 2017 at 11:51 AM, Robert Varga <nite@...> wrote:
On 13/07/17 17:22, Thanh Ha wrote:
> On Wed, Jul 12, 2017 at 4:35 PM, Thanh Ha <thanh.ha@...
> <mailto:thanh.ha@linuxfoundation.org>> wrote:
>
>     Hi Everyone,
>
>     I cancelled autorelease-nitrogen build #102 because SFT was hanging
>     for over 3 hrs. I'm not sure if this is normal or if it's the issue
>     we were seeing before back in Carbon?
>
>     [INFO] OpenDaylight :: Clustered Netconf Topology :: Netconf Connector + Netconf SSH Server FAILURE [  03:02 h]
>
>
>     https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-nitrogen/102/
>     <https://jenkins.opendaylight.org/releng/view/autorelease/job/autorelease-release-nitrogen/102/>
>
>     Regards,
>     Thanh
>
>
> So this is a serious problem. Build #103 was hanging too on "ODL ::
> neutron :: integration-test 0.9.0-Nitrogen". and build #104 is currently
> hanging on the same thing.
>
> Not sure if it's related but appears we are consistently hanging tests
> now. How do we troubleshoot this?

Logs have these two last items:

2017-07-12 17:40:12,572 | INFO  | Framework stop   | ContextHandler
             | 275 - org.eclipse.jetty.util - 9.2.19.v20160908 | Stopped
HttpServiceContext{httpContext=WebAppHttpContext{org.opendaylight.aaa.shiro
- 294}}
2017-07-12 17:40:12,574 | INFO  | Framework stop   | Activator
             | 326 - org.ops4j.pax.web.pax-web-extender-war - 4.3.0 |
Destroying extension for bundle org.opendaylight.aaa.idmlight

So the framework is actually stopping, but halted on something related
to either idmlight...

As Michael noted, a thread dump would shed more light on this.


The attached zip contains the jstacks for all the java processes I could find. I wasn't sure which process was the correct one but I printed their `ps` statement at the top of each log which I hope helps pinpoint the correct one for those who know what to look for.

Regards,
Thanh


_______________________________________________
release mailing list
release@...
https://lists.opendaylight.org/mailman/listinfo/release






_______________________________________________
release mailing list
release@...
https://lists.opendaylight.org/mailman/listinfo/release




_______________________________________________
netconf-dev mailing list
netconf-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/netconf-dev