[controller-dev] Integration distribution failing Single Feature test in master


Luis Gomez <ecelgp@...>
 

Thanks Tom for your analysis, lisp and ofjava people, would you mind taking a look at these comments?

BR/Luis


On Oct 13, 2015, at 2:31 PM, Tom Pantelis <tompantelis@...> wrote:

Comparing to the last successful run on Oct 5th, there's a couple errors now appearing:

2015-10-13 15:41:51,778 | ERROR | bundle-tracker-0 | ModuleInfoBundleTracker          | 131 - org.opendaylight.controller.config-manager - 0.4.0.SNAPSHOT | Failed to process bundleentry://188.fwk532513438/META-INF/services/org.opendaylight.yangtools.yang.binding.YangModelBindingProvider for bundle org.opendaylight.openflowjava.openflow-protocol-api_0.7.0.SNAPSHOT [188]
java.lang.IllegalStateException: Error while executing getModuleInfo on org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.$YangModelBindingProvider@4a0236de
...
Caused by: java.lang.IllegalStateException: Resource '/META-INF/yang/openflow-instruction.yang' is missing
	at org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.common.instruction.rev130731.$YangModuleInfoImpl.<init>($YangModuleInfoImpl.java:31)[188:org.opendaylight.openflowjava.openflow-protocol-api:0.7.0.SNAPSHOT]
	

This one has been happening for a while but it *seems* to be benign.


2015-10-13 15:48:11,064 | ERROR | rint Extender: 3 | BlueprintContainerImpl           | 15 - org.apache.aries.blueprint.core - 1.4.2 | Unable to start blueprint container for bundle org.opendaylight.lispflowmapping.mappingservice.shell due to unresolved dependencies [(objectClass=org.opendaylight.lispflowmapping.interfaces.mappingservice.IMappingServiceShell)]
java.util.concurrent.TimeoutException
	at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)[15:org.apache.aries.blueprint.core:1.4.2]
	at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48)[15:org.apache.aries.blueprint.core:1.4.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)[:1.7.0_85]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)[:1.7.0_85]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
	at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]

This one appears to emanate from lispflowmapping. It's using blueprint and appears to import an OSGi service, IMappingServiceShell, that wasn't found. The default timeout for blueprint is 5 min but I don't know if the test blocks on this blueprint container and will fail as a result. Either way it seems this should be looked at by a lispflowmapping contributor.

There may be other stuff going on. It would be useful to run it by hand and, when/if it appears stuck, use jstack to get a thread dump. Also it's hard to tell why the OOM errors are occurring - one of the tests indicates they started to occur after shutdown was started. For that it would be useful to get a heap dump via jmap or, better yet, run the test with the -XX:+HeapDumpOnOutOfMemoryError option enabled if possible.

On Tue, Oct 13, 2015 at 4:43 PM, Luis Gomez <ecelgp@...> wrote:
Look in at more failing distribution jobs, there is always a timeout in the Single Feature, the distribution used to build in 8 mins while now it takes more than 1 hour.

BR/Luis


On Oct 13, 2015, at 12:13 PM, Luis Gomez <ecelgp@...> wrote:

Hi all,

I just observed the distribution in master is failing since Oct 6th [1]. Last errors [2] show memory issues like below but I am not sure this is the root cause for this. Can anyone help identifying the problem here?

Thanks/Luis 


Exception in thread "qtp1815616686-79" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.HashMap.newKeyIterator(HashMap.java:968)
	at java.util.HashMap$KeySet.iterator(HashMap.java:1002)
	at java.util.HashSet.iterator(HashSet.java:170)
	at sun.nio.ch.Util$2.iterator(Util.java:303)
	at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:600)
	at org.eclipse.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)

Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
	at java.io.BufferedReader.<init>(BufferedReader.java:98)
	at java.io.BufferedReader.<init>(BufferedReader.java:109)
	at java.io.LineNumberReader.<init>(LineNumberReader.java:72)
	at org.apache.felix.utils.properties.Properties$PropertiesReader.<init>(Properties.java:748)
	at org.apache.felix.utils.properties.Properties.loadLayout(Properties.java:352)
	at org.apache.felix.utils.properties.Properties.load(Properties.java:142)
	at org.apache.felix.utils.properties.Properties.load(Properties.java:138)
	at org.apache.felix.utils.properties.Properties.load(Properties.java:122)
	at org.apache.felix.utils.properties.Properties.<init>(Properties.java:107)
	at org.apache.felix.utils.properties.Properties.<init>(Properties.java:96)
	at org.apache.karaf.jaas.modules.properties.AutoEncryptionSupport$1.run(AutoEncryptionSupport.java:63)
Exception in thread "INT-2,ISPN,rk-c7-merge-6c0-16483" 	at java.util.TimerThread.mainLoop(Timer.java:555)
	at java.util.TimerThread.run(Timer.java:505)
java.lang.OutOfMemoryError: Java heap space
	at org.jgroups.util.Util.readLongSequence(Util.java:2235)
	at org.jgroups.util.Digest.readFrom(Digest.java:166)
	at org.jgroups.util.Digest.readFrom(Digest.java:154)
	at org.jgroups.util.Util.readStreamable(Util.java:1105)
	at org.jgroups.util.Util.streamableFromBuffer(Util.java:773)
	at org.jgroups.protocols.pbcast.STABLE.readDigest(STABLE.java:695)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:237)
	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:448)
	at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:636)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
	at org.jgroups.protocols.FD.up(FD.java:255)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
	at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
	at org.jgroups.protocols.Discovery.up(Discovery.java:379)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1399)
	at org.jgroups.protocols.TP$4.run(TP.java:1327)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Exception in thread "qtp431119273-318" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp431119273-85" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI RenewClean-[10.30.11.239:44444]" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space





_______________________________________________
controller-dev mailing list
controller-dev@...
https://lists.opendaylight.org/mailman/listinfo/controller-dev




Lori Jakab <lojakab@...>
 

On 10/14/15 3:28 AM, Luis Gomez wrote:
Thanks Tom for your analysis, lisp and ofjava people, would you mind
taking a look at these comments?
Hi Luis, all,

We just pushed a patch to disable the timeout on the blueprint
container, to allow more time for the services to come up. We've seen
this exception in the past for example when a patch slowed down the
config subsystem as a side effect. It means that in the 5 minutes
default timeout the core mappingservice did not initialize, and didn't
register an implementation of IMappingServiceShell with OSGi.

I have no idea how the test environment handles the exception, so I hope
disabling the timeout will help.

From the other analysis on the thread it looks like lisp is only causing
issues in terms of more heap needed after the additional features were
added, but not in terms of functionality.

-Lori


BR/Luis


On Oct 13, 2015, at 2:31 PM, Tom Pantelis <tompantelis@gmail.com
<mailto:tompantelis@gmail.com>> wrote:

Comparing to the last successful run on Oct 5th, there's a couple
errors now appearing:

2015-10-13 15:41:51,778 | ERROR | bundle-tracker-0 | ModuleInfoBundleTracker | 131 - org.opendaylight.controller.config-manager - 0.4.0.SNAPSHOT | Failed to process bundleentry://188.fwk532513438/META-INF/services/org.opendaylight.yangtools.yang.binding.YangModelBindingProvider for bundle org.opendaylight.openflowjava.openflow-protocol-api_0.7.0.SNAPSHOT [188]
java.lang.IllegalStateException: Error while executing getModuleInfo on org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.$YangModelBindingProvider@4a0236de
...
Caused by: java.lang.IllegalStateException: Resource '/META-INF/yang/openflow-instruction.yang' is missing
at org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.common.instruction.rev130731.$YangModuleInfoImpl.<init>($YangModuleInfoImpl.java:31)[188:org.opendaylight.openflowjava.openflow-protocol-api:0.7.0.SNAPSHOT]

This one has been happening for a while but it *seems* to be benign.

2015-10-13 15:48:11,064 | ERROR | rint Extender: 3 | BlueprintContainerImpl | 15 - org.apache.aries.blueprint.core - 1.4.2 | Unable to start blueprint container for bundle org.opendaylight.lispflowmapping.mappingservice.shell due to unresolved dependencies [(objectClass=org.opendaylight.lispflowmapping.interfaces.mappingservice.IMappingServiceShell)]
java.util.concurrent.TimeoutException
at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)[15:org.apache.aries.blueprint.core:1.4.2]
at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48)[15:org.apache.aries.blueprint.core:1.4.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
This one appears to emanate from lispflowmapping. It's using
blueprint and appears to import an OSGi
service, IMappingServiceShell, that wasn't found. The default timeout
for blueprint is 5 min but I don't know if the test blocks on this
blueprint container and will fail as a result. Either way it seems
this should be looked at by a lispflowmapping contributor.
There may be other stuff going on. It would be useful to run it by
hand and, when/if it appears stuck, use jstack to get a thread dump.
Also it's hard to tell why the OOM errors are occurring - one of the
tests indicates they started to occur after shutdown was started. For
that it would be useful to get a heap dump via jmap or, better yet,
run the test with the -XX:+HeapDumpOnOutOfMemoryError option enabled
if possible.

On Tue, Oct 13, 2015 at 4:43 PM, Luis Gomez <ecelgp@gmail.com
<mailto:ecelgp@gmail.com>> wrote:

Look in at more failing distribution jobs, there is always a
timeout in the Single Feature, the distribution used to build in
8 mins while now it takes more than 1 hour.

BR/Luis


On Oct 13, 2015, at 12:13 PM, Luis Gomez <ecelgp@gmail.com
<mailto:ecelgp@gmail.com>> wrote:

Hi all,

I just observed the distribution in master is failing since Oct
6th [1]. Last errors [2] show memory issues like below but I am
not sure this is the root cause for this. Can anyone help
identifying the problem here?

Thanks/Luis

[1] https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/
[2]
https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/444/testReport/

Exception in thread "qtp1815616686-79" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newKeyIterator(HashMap.java:968)
at java.util.HashMap$KeySet.iterator(HashMap.java:1002)
at java.util.HashSet.iterator(HashSet.java:170)
at sun.nio.ch.Util$2.iterator(Util.java:303)
at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:600)
at org.eclipse.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)

Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedReader.<init>(BufferedReader.java:98)
at java.io.BufferedReader.<init>(BufferedReader.java:109)
at java.io.LineNumberReader.<init>(LineNumberReader.java:72)
at org.apache.felix.utils.properties.Properties$PropertiesReader.<init>(Properties.java:748)
at org.apache.felix.utils.properties.Properties.loadLayout(Properties.java:352)
at org.apache.felix.utils.properties.Properties.load(Properties.java:142)
at org.apache.felix.utils.properties.Properties.load(Properties.java:138)
at org.apache.felix.utils.properties.Properties.load(Properties.java:122)
at org.apache.felix.utils.properties.Properties.<init>(Properties.java:107)
at org.apache.felix.utils.properties.Properties.<init>(Properties.java:96)
at org.apache.karaf.jaas.modules.properties.AutoEncryptionSupport$1.run(AutoEncryptionSupport.java:63)
Exception in thread "INT-2,ISPN,rk-c7-merge-6c0-16483" at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
java.lang.OutOfMemoryError: Java heap space
at org.jgroups.util.Util.readLongSequence(Util.java:2235)
at org.jgroups.util.Digest.readFrom(Digest.java:166)
at org.jgroups.util.Digest.readFrom(Digest.java:154)
at org.jgroups.util.Util.readStreamable(Util.java:1105)
at org.jgroups.util.Util.streamableFromBuffer(Util.java:773)
at org.jgroups.protocols.pbcast.STABLE.readDigest(STABLE.java:695)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:237)
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:448)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:636)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
at org.jgroups.protocols.FD.up(FD.java:255)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
at org.jgroups.protocols.Discovery.up(Discovery.java:379)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1399)
at org.jgroups.protocols.TP <http://org.jgroups.protocols.tp/>$4.run(TP.java:1327)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "qtp431119273-318" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp431119273-85" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI RenewClean-[10.30.11.239:44444 <http://10.30.11.239:44444/>]" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space



_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
<mailto:controller-dev@lists.opendaylight.org>
https://lists.opendaylight.org/mailman/listinfo/controller-dev



_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Luis Gomez <ecelgp@...>
 

So something must have changed today because I do not see the memory issues anymore but I see 5 mins blueprint timeout happening with both lispflowmapping and dlux [1]. Can we do anything about? It is already 1 week we do not run any system test in master because distribution fails the single feature test...

BR/Luis


I will bring this issue to TSC tomorrow if we do not find a solution today.

BR/Luis


On Oct 14, 2015, at 2:33 AM, Lori Jakab <lojakab@...> wrote:

On 10/14/15 3:28 AM, Luis Gomez wrote:
Thanks Tom for your analysis, lisp and ofjava people, would you mind
taking a look at these comments?

Hi Luis, all,

We just pushed a patch to disable the timeout on the blueprint
container, to allow more time for the services to come up.  We've seen
this exception in the past for example when a patch slowed down the
config subsystem as a side effect.  It means that in the 5 minutes
default timeout the core mappingservice did not initialize, and didn't
register an implementation of IMappingServiceShell with OSGi.

I have no idea how the test environment handles the exception, so I hope
disabling the timeout will help.

From the other analysis on the thread it looks like lisp is only causing
issues in terms of more heap needed after the additional features were
added, but not in terms of functionality.

-Lori


BR/Luis


On Oct 13, 2015, at 2:31 PM, Tom Pantelis <tompantelis@...
<mailto:tompantelis@...>> wrote:

Comparing to the last successful run on Oct 5th, there's a couple
errors now appearing:

2015-10-13 15:41:51,778 | ERROR | bundle-tracker-0 | ModuleInfoBundleTracker          | 131 - org.opendaylight.controller.config-manager - 0.4.0.SNAPSHOT | Failed to process bundleentry://188.fwk532513438/META-INF/services/org.opendaylight.yangtools.yang.binding.YangModelBindingProvider for bundle org.opendaylight.openflowjava.openflow-protocol-api_0.7.0.SNAPSHOT [188]
java.lang.IllegalStateException: Error while executing getModuleInfo on org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.$YangModelBindingProvider@4a0236de
...
Caused by: java.lang.IllegalStateException: Resource '/META-INF/yang/openflow-instruction.yang' is missing
at org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.common.instruction.rev130731.$YangModuleInfoImpl.<init>($YangModuleInfoImpl.java:31)[188:org.opendaylight.openflowjava.openflow-protocol-api:0.7.0.SNAPSHOT]

This one has been happening for a while but it *seems* to be benign.

2015-10-13 15:48:11,064 | ERROR | rint Extender: 3 | BlueprintContainerImpl           | 15 - org.apache.aries.blueprint.core - 1.4.2 | Unable to start blueprint container for bundle org.opendaylight.lispflowmapping.mappingservice.shell due to unresolved dependencies [(objectClass=org.opendaylight.lispflowmapping.interfaces.mappingservice.IMappingServiceShell)]
java.util.concurrent.TimeoutException
at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)[15:org.apache.aries.blueprint.core:1.4.2]
at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48)[15:org.apache.aries.blueprint.core:1.4.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
This one appears to emanate from lispflowmapping. It's using
blueprint and appears to import an OSGi
service, IMappingServiceShell, that wasn't found. The default timeout
for blueprint is 5 min but I don't know if the test blocks on this
blueprint container and will fail as a result. Either way it seems
this should be looked at by a lispflowmapping contributor.
There may be other stuff going on. It would be useful to run it by
hand and, when/if it appears stuck, use jstack to get a thread dump.
Also it's hard to tell why the OOM errors are occurring - one of the
tests indicates they started to occur after shutdown was started. For
that it would be useful to get a heap dump via jmap or, better yet,
run the test with the -XX:+HeapDumpOnOutOfMemoryError option enabled
if possible.

On Tue, Oct 13, 2015 at 4:43 PM, Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   Look in at more failing distribution jobs, there is always a
   timeout in the Single Feature, the distribution used to build in
   8 mins while now it takes more than 1 hour.

   BR/Luis


   On Oct 13, 2015, at 12:13 PM, Luis Gomez <ecelgp@...
   <mailto:ecelgp@...>> wrote:

   Hi all,

   I just observed the distribution in master is failing since Oct
   6th [1]. Last errors [2] show memory issues like below but I am
   not sure this is the root cause for this. Can anyone help
   identifying the problem here?

   Thanks/Luis

   [1] https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/
   [2]
   https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/444/testReport/

   Exception in thread "qtp1815616686-79" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.HashMap.newKeyIterator(HashMap.java:968)
    at java.util.HashMap$KeySet.iterator(HashMap.java:1002)
    at java.util.HashSet.iterator(HashSet.java:170)
    at sun.nio.ch.Util$2.iterator(Util.java:303)
    at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:600)
    at org.eclipse.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

   Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
    at java.io.BufferedReader.<init>(BufferedReader.java:98)
    at java.io.BufferedReader.<init>(BufferedReader.java:109)
    at java.io.LineNumberReader.<init>(LineNumberReader.java:72)
    at org.apache.felix.utils.properties.Properties$PropertiesReader.<init>(Properties.java:748)
    at org.apache.felix.utils.properties.Properties.loadLayout(Properties.java:352)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:142)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:138)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:122)
    at org.apache.felix.utils.properties.Properties.<init>(Properties.java:107)
    at org.apache.felix.utils.properties.Properties.<init>(Properties.java:96)
    at org.apache.karaf.jaas.modules.properties.AutoEncryptionSupport$1.run(AutoEncryptionSupport.java:63)
   Exception in thread "INT-2,ISPN,rk-c7-merge-6c0-16483" at java.util.TimerThread.mainLoop(Timer.java:555)
    at java.util.TimerThread.run(Timer.java:505)
   java.lang.OutOfMemoryError: Java heap space
    at org.jgroups.util.Util.readLongSequence(Util.java:2235)
    at org.jgroups.util.Digest.readFrom(Digest.java:166)
    at org.jgroups.util.Digest.readFrom(Digest.java:154)
    at org.jgroups.util.Util.readStreamable(Util.java:1105)
    at org.jgroups.util.Util.streamableFromBuffer(Util.java:773)
    at org.jgroups.protocols.pbcast.STABLE.readDigest(STABLE.java:695)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:237)
    at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:448)
    at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:636)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
    at org.jgroups.protocols.FD.up(FD.java:255)
    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
    at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
    at org.jgroups.protocols.Discovery.up(Discovery.java:379)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:1399)
    at org.jgroups.protocols.TP <http://org.jgroups.protocols.tp/>$4.run(TP.java:1327)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
   Exception in thread "qtp431119273-318" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "qtp431119273-85" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "RMI RenewClean-[10.30.11.239:44444 <http://10.30.11.239:44444/>]" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space





   _______________________________________________
   controller-dev mailing list
   controller-dev@...
   <mailto:controller-dev@...>
   https://lists.opendaylight.org/mailman/listinfo/controller-dev





_______________________________________________
controller-dev mailing list
controller-dev@...
https://lists.opendaylight.org/mailman/listinfo/controller-dev



Luis Gomez <ecelgp@...>
 

OK,I think I managed to unblock the master integration distribution [1] by removing:

- sdninterface feature: I am not sure what is wrong here but please ENABLE the singlefeature test in your project and then reapply to integration.
- sfc-netconf feature: this feature was taking long time to install and therefore was timing out the singlefeature test. Please work with netconf group to debug the issue.

BR/Luis



On Oct 14, 2015, at 4:04 PM, Luis Gomez <ecelgp@...> wrote:

So something must have changed today because I do not see the memory issues anymore but I see 5 mins blueprint timeout happening with both lispflowmapping and dlux [1]. Can we do anything about? It is already 1 week we do not run any system test in master because distribution fails the single feature test...

BR/Luis


I will bring this issue to TSC tomorrow if we do not find a solution today.

BR/Luis


On Oct 14, 2015, at 2:33 AM, Lori Jakab <lojakab@...> wrote:

On 10/14/15 3:28 AM, Luis Gomez wrote:
Thanks Tom for your analysis, lisp and ofjava people, would you mind
taking a look at these comments?

Hi Luis, all,

We just pushed a patch to disable the timeout on the blueprint
container, to allow more time for the services to come up.  We've seen
this exception in the past for example when a patch slowed down the
config subsystem as a side effect.  It means that in the 5 minutes
default timeout the core mappingservice did not initialize, and didn't
register an implementation of IMappingServiceShell with OSGi.

I have no idea how the test environment handles the exception, so I hope
disabling the timeout will help.

From the other analysis on the thread it looks like lisp is only causing
issues in terms of more heap needed after the additional features were
added, but not in terms of functionality.

-Lori


BR/Luis


On Oct 13, 2015, at 2:31 PM, Tom Pantelis <tompantelis@...
<mailto:tompantelis@...>> wrote:

Comparing to the last successful run on Oct 5th, there's a couple
errors now appearing:

2015-10-13 15:41:51,778 | ERROR | bundle-tracker-0 | ModuleInfoBundleTracker          | 131 - org.opendaylight.controller.config-manager - 0.4.0.SNAPSHOT | Failed to process bundleentry://188.fwk532513438/META-INF/services/org.opendaylight.yangtools.yang.binding.YangModelBindingProvider for bundle org.opendaylight.openflowjava.openflow-protocol-api_0.7.0.SNAPSHOT [188]
java.lang.IllegalStateException: Error while executing getModuleInfo on org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.$YangModelBindingProvider@4a0236de
...
Caused by: java.lang.IllegalStateException: Resource '/META-INF/yang/openflow-instruction.yang' is missing
at org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.common.instruction.rev130731.$YangModuleInfoImpl.<init>($YangModuleInfoImpl.java:31)[188:org.opendaylight.openflowjava.openflow-protocol-api:0.7.0.SNAPSHOT]

This one has been happening for a while but it *seems* to be benign.

2015-10-13 15:48:11,064 | ERROR | rint Extender: 3 | BlueprintContainerImpl           | 15 - org.apache.aries.blueprint.core - 1.4.2 | Unable to start blueprint container for bundle org.opendaylight.lispflowmapping.mappingservice.shell due to unresolved dependencies [(objectClass=org.opendaylight.lispflowmapping.interfaces.mappingservice.IMappingServiceShell)]
java.util.concurrent.TimeoutException
at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)[15:org.apache.aries.blueprint.core:1.4.2]
at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48)[15:org.apache.aries.blueprint.core:1.4.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
This one appears to emanate from lispflowmapping. It's using
blueprint and appears to import an OSGi
service, IMappingServiceShell, that wasn't found. The default timeout
for blueprint is 5 min but I don't know if the test blocks on this
blueprint container and will fail as a result. Either way it seems
this should be looked at by a lispflowmapping contributor.
There may be other stuff going on. It would be useful to run it by
hand and, when/if it appears stuck, use jstack to get a thread dump.
Also it's hard to tell why the OOM errors are occurring - one of the
tests indicates they started to occur after shutdown was started. For
that it would be useful to get a heap dump via jmap or, better yet,
run the test with the -XX:+HeapDumpOnOutOfMemoryError option enabled
if possible.

On Tue, Oct 13, 2015 at 4:43 PM, Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   Look in at more failing distribution jobs, there is always a
   timeout in the Single Feature, the distribution used to build in
   8 mins while now it takes more than 1 hour.

   BR/Luis


   On Oct 13, 2015, at 12:13 PM, Luis Gomez <ecelgp@...
   <mailto:ecelgp@...>> wrote:

   Hi all,

   I just observed the distribution in master is failing since Oct
   6th [1]. Last errors [2] show memory issues like below but I am
   not sure this is the root cause for this. Can anyone help
   identifying the problem here?

   Thanks/Luis

   [1] https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/
   [2]
   https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/444/testReport/

   Exception in thread "qtp1815616686-79" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.HashMap.newKeyIterator(HashMap.java:968)
    at java.util.HashMap$KeySet.iterator(HashMap.java:1002)
    at java.util.HashSet.iterator(HashSet.java:170)
    at sun.nio.ch.Util$2.iterator(Util.java:303)
    at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:600)
    at org.eclipse.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

   Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
    at java.io.BufferedReader.<init>(BufferedReader.java:98)
    at java.io.BufferedReader.<init>(BufferedReader.java:109)
    at java.io.LineNumberReader.<init>(LineNumberReader.java:72)
    at org.apache.felix.utils.properties.Properties$PropertiesReader.<init>(Properties.java:748)
    at org.apache.felix.utils.properties.Properties.loadLayout(Properties.java:352)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:142)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:138)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:122)
    at org.apache.felix.utils.properties.Properties.<init>(Properties.java:107)
    at org.apache.felix.utils.properties.Properties.<init>(Properties.java:96)
    at org.apache.karaf.jaas.modules.properties.AutoEncryptionSupport$1.run(AutoEncryptionSupport.java:63)
   Exception in thread "INT-2,ISPN,rk-c7-merge-6c0-16483" at java.util.TimerThread.mainLoop(Timer.java:555)
    at java.util.TimerThread.run(Timer.java:505)
   java.lang.OutOfMemoryError: Java heap space
    at org.jgroups.util.Util.readLongSequence(Util.java:2235)
    at org.jgroups.util.Digest.readFrom(Digest.java:166)
    at org.jgroups.util.Digest.readFrom(Digest.java:154)
    at org.jgroups.util.Util.readStreamable(Util.java:1105)
    at org.jgroups.util.Util.streamableFromBuffer(Util.java:773)
    at org.jgroups.protocols.pbcast.STABLE.readDigest(STABLE.java:695)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:237)
    at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:448)
    at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:636)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
    at org.jgroups.protocols.FD.up(FD.java:255)
    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
    at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
    at org.jgroups.protocols.Discovery.up(Discovery.java:379)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:1399)
    at org.jgroups.protocols.TP <http://org.jgroups.protocols.tp/>$4.run(TP.java:1327)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
   Exception in thread "qtp431119273-318" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "qtp431119273-85" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "RMI RenewClean-[10.30.11.239:44444 <http://10.30.11.239:44444/>]" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space





   _______________________________________________
   controller-dev mailing list
   controller-dev@...
   <mailto:controller-dev@...>
   https://lists.opendaylight.org/mailman/listinfo/controller-dev





_______________________________________________
controller-dev mailing list
controller-dev@...
https://lists.opendaylight.org/mailman/listinfo/controller-dev




Colin Dixon
 

Thanks for all the hard work. Do we have an idea what the root cause here was?

--Colin


On Wed, Oct 14, 2015 at 11:42 PM, Luis Gomez <ecelgp@...> wrote:
OK,I think I managed to unblock the master integration distribution [1] by removing:

- sdninterface feature: I am not sure what is wrong here but please ENABLE the singlefeature test in your project and then reapply to integration.
- sfc-netconf feature: this feature was taking long time to install and therefore was timing out the singlefeature test. Please work with netconf group to debug the issue.

BR/Luis



On Oct 14, 2015, at 4:04 PM, Luis Gomez <ecelgp@...> wrote:

So something must have changed today because I do not see the memory issues anymore but I see 5 mins blueprint timeout happening with both lispflowmapping and dlux [1]. Can we do anything about? It is already 1 week we do not run any system test in master because distribution fails the single feature test...

BR/Luis


I will bring this issue to TSC tomorrow if we do not find a solution today.

BR/Luis


On Oct 14, 2015, at 2:33 AM, Lori Jakab <lojakab@...> wrote:

On 10/14/15 3:28 AM, Luis Gomez wrote:
Thanks Tom for your analysis, lisp and ofjava people, would you mind
taking a look at these comments?

Hi Luis, all,

We just pushed a patch to disable the timeout on the blueprint
container, to allow more time for the services to come up.  We've seen
this exception in the past for example when a patch slowed down the
config subsystem as a side effect.  It means that in the 5 minutes
default timeout the core mappingservice did not initialize, and didn't
register an implementation of IMappingServiceShell with OSGi.

I have no idea how the test environment handles the exception, so I hope
disabling the timeout will help.

From the other analysis on the thread it looks like lisp is only causing
issues in terms of more heap needed after the additional features were
added, but not in terms of functionality.

-Lori


BR/Luis


On Oct 13, 2015, at 2:31 PM, Tom Pantelis <tompantelis@...
<mailto:tompantelis@...>> wrote:

Comparing to the last successful run on Oct 5th, there's a couple
errors now appearing:

2015-10-13 15:41:51,778 | ERROR | bundle-tracker-0 | ModuleInfoBundleTracker          | 131 - org.opendaylight.controller.config-manager - 0.4.0.SNAPSHOT | Failed to process bundleentry://188.fwk532513438/META-INF/services/org.opendaylight.yangtools.yang.binding.YangModelBindingProvider for bundle org.opendaylight.openflowjava.openflow-protocol-api_0.7.0.SNAPSHOT [188]
java.lang.IllegalStateException: Error while executing getModuleInfo on org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.$YangModelBindingProvider@4a0236de
...
Caused by: java.lang.IllegalStateException: Resource '/META-INF/yang/openflow-instruction.yang' is missing
at org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.common.instruction.rev130731.$YangModuleInfoImpl.<init>($YangModuleInfoImpl.java:31)[188:org.opendaylight.openflowjava.openflow-protocol-api:0.7.0.SNAPSHOT]

This one has been happening for a while but it *seems* to be benign.

2015-10-13 15:48:11,064 | ERROR | rint Extender: 3 | BlueprintContainerImpl           | 15 - org.apache.aries.blueprint.core - 1.4.2 | Unable to start blueprint container for bundle org.opendaylight.lispflowmapping.mappingservice.shell due to unresolved dependencies [(objectClass=org.opendaylight.lispflowmapping.interfaces.mappingservice.IMappingServiceShell)]
java.util.concurrent.TimeoutException
at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)[15:org.apache.aries.blueprint.core:1.4.2]
at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48)[15:org.apache.aries.blueprint.core:1.4.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
This one appears to emanate from lispflowmapping. It's using
blueprint and appears to import an OSGi
service, IMappingServiceShell, that wasn't found. The default timeout
for blueprint is 5 min but I don't know if the test blocks on this
blueprint container and will fail as a result. Either way it seems
this should be looked at by a lispflowmapping contributor.
There may be other stuff going on. It would be useful to run it by
hand and, when/if it appears stuck, use jstack to get a thread dump.
Also it's hard to tell why the OOM errors are occurring - one of the
tests indicates they started to occur after shutdown was started. For
that it would be useful to get a heap dump via jmap or, better yet,
run the test with the -XX:+HeapDumpOnOutOfMemoryError option enabled
if possible.

On Tue, Oct 13, 2015 at 4:43 PM, Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   Look in at more failing distribution jobs, there is always a
   timeout in the Single Feature, the distribution used to build in
   8 mins while now it takes more than 1 hour.

   BR/Luis


   On Oct 13, 2015, at 12:13 PM, Luis Gomez <ecelgp@...
   <mailto:ecelgp@...>> wrote:

   Hi all,

   I just observed the distribution in master is failing since Oct
   6th [1]. Last errors [2] show memory issues like below but I am
   not sure this is the root cause for this. Can anyone help
   identifying the problem here?

   Thanks/Luis

   [1] https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/
   [2]
   https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/444/testReport/

   Exception in thread "qtp1815616686-79" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.HashMap.newKeyIterator(HashMap.java:968)
    at java.util.HashMap$KeySet.iterator(HashMap.java:1002)
    at java.util.HashSet.iterator(HashSet.java:170)
    at sun.nio.ch.Util$2.iterator(Util.java:303)
    at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:600)
    at org.eclipse.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

   Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
    at java.io.BufferedReader.<init>(BufferedReader.java:98)
    at java.io.BufferedReader.<init>(BufferedReader.java:109)
    at java.io.LineNumberReader.<init>(LineNumberReader.java:72)
    at org.apache.felix.utils.properties.Properties$PropertiesReader.<init>(Properties.java:748)
    at org.apache.felix.utils.properties.Properties.loadLayout(Properties.java:352)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:142)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:138)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:122)
    at org.apache.felix.utils.properties.Properties.<init>(Properties.java:107)
    at org.apache.felix.utils.properties.Properties.<init>(Properties.java:96)
    at org.apache.karaf.jaas.modules.properties.AutoEncryptionSupport$1.run(AutoEncryptionSupport.java:63)
   Exception in thread "INT-2,ISPN,rk-c7-merge-6c0-16483" at java.util.TimerThread.mainLoop(Timer.java:555)
    at java.util.TimerThread.run(Timer.java:505)
   java.lang.OutOfMemoryError: Java heap space
    at org.jgroups.util.Util.readLongSequence(Util.java:2235)
    at org.jgroups.util.Digest.readFrom(Digest.java:166)
    at org.jgroups.util.Digest.readFrom(Digest.java:154)
    at org.jgroups.util.Util.readStreamable(Util.java:1105)
    at org.jgroups.util.Util.streamableFromBuffer(Util.java:773)
    at org.jgroups.protocols.pbcast.STABLE.readDigest(STABLE.java:695)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:237)
    at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:448)
    at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:636)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
    at org.jgroups.protocols.FD.up(FD.java:255)
    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
    at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
    at org.jgroups.protocols.Discovery.up(Discovery.java:379)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:1399)
    at org.jgroups.protocols.TP <http://org.jgroups.protocols.tp/>$4.run(TP.java:1327)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
   Exception in thread "qtp431119273-318" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "qtp431119273-85" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "RMI RenewClean-[10.30.11.239:44444 <http://10.30.11.239:44444/>]" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space





   _______________________________________________
   controller-dev mailing list
   controller-dev@...
   <mailto:controller-dev@...>
   https://lists.opendaylight.org/mailman/listinfo/controller-dev





_______________________________________________
controller-dev mailing list
controller-dev@...
https://lists.opendaylight.org/mailman/listinfo/controller-dev




_______________________________________________
controller-dev mailing list
controller-dev@...
https://lists.opendaylight.org/mailman/listinfo/controller-dev



Luis Gomez <ecelgp@...>
 

Actually I have not. Since I did not find any relevant information in the karaf log to guide me to the cause, I used the brut force: start removing features until the issue disappears.

After doing that I came up with the 2 features impacting the single feature test and a couple of conclusions:

- The blueprint timeout errors we saw in the karaf log were happening because the karaf container was not starting properly.
- The memory issues we saw in the karaf log, I am almost sure they were produced by a hanging karaf container (generated by an earlier single feature test) running in the vm.

BR/Luis


On Oct 15, 2015, at 8:00 AM, Colin Dixon <colin@...> wrote:

Thanks for all the hard work. Do we have an idea what the root cause here was?

--Colin


On Wed, Oct 14, 2015 at 11:42 PM, Luis Gomez <ecelgp@...> wrote:
OK,I think I managed to unblock the master integration distribution [1] by removing:

- sdninterface feature: I am not sure what is wrong here but please ENABLE the singlefeature test in your project and then reapply to integration.
- sfc-netconf feature: this feature was taking long time to install and therefore was timing out the singlefeature test. Please work with netconf group to debug the issue.

BR/Luis



On Oct 14, 2015, at 4:04 PM, Luis Gomez <ecelgp@...> wrote:

So something must have changed today because I do not see the memory issues anymore but I see 5 mins blueprint timeout happening with both lispflowmapping and dlux [1]. Can we do anything about? It is already 1 week we do not run any system test in master because distribution fails the single feature test...

BR/Luis


I will bring this issue to TSC tomorrow if we do not find a solution today.

BR/Luis


On Oct 14, 2015, at 2:33 AM, Lori Jakab <lojakab@...> wrote:

On 10/14/15 3:28 AM, Luis Gomez wrote:
Thanks Tom for your analysis, lisp and ofjava people, would you mind
taking a look at these comments?

Hi Luis, all,

We just pushed a patch to disable the timeout on the blueprint
container, to allow more time for the services to come up.  We've seen
this exception in the past for example when a patch slowed down the
config subsystem as a side effect.  It means that in the 5 minutes
default timeout the core mappingservice did not initialize, and didn't
register an implementation of IMappingServiceShell with OSGi.

I have no idea how the test environment handles the exception, so I hope
disabling the timeout will help.

From the other analysis on the thread it looks like lisp is only causing
issues in terms of more heap needed after the additional features were
added, but not in terms of functionality.

-Lori


BR/Luis


On Oct 13, 2015, at 2:31 PM, Tom Pantelis <tompantelis@...
<mailto:tompantelis@...>> wrote:

Comparing to the last successful run on Oct 5th, there's a couple
errors now appearing:

2015-10-13 15:41:51,778 | ERROR | bundle-tracker-0 | ModuleInfoBundleTracker          | 131 - org.opendaylight.controller.config-manager - 0.4.0.SNAPSHOT | Failed to process bundleentry://188.fwk532513438/META-INF/services/org.opendaylight.yangtools.yang.binding.YangModelBindingProvider for bundle org.opendaylight.openflowjava.openflow-protocol-api_0.7.0.SNAPSHOT [188]
java.lang.IllegalStateException: Error while executing getModuleInfo on org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.protocol.rev130731.$YangModelBindingProvider@4a0236de
...
Caused by: java.lang.IllegalStateException: Resource '/META-INF/yang/openflow-instruction.yang' is missing
at org.opendaylight.yang.gen.v1.urn.opendaylight.openflow.common.instruction.rev130731.$YangModuleInfoImpl.<init>($YangModuleInfoImpl.java:31)[188:org.opendaylight.openflowjava.openflow-protocol-api:0.7.0.SNAPSHOT]

This one has been happening for a while but it *seems* to be benign.

2015-10-13 15:48:11,064 | ERROR | rint Extender: 3 | BlueprintContainerImpl           | 15 - org.apache.aries.blueprint.core - 1.4.2 | Unable to start blueprint container for bundle org.opendaylight.lispflowmapping.mappingservice.shell due to unresolved dependencies [(objectClass=org.opendaylight.lispflowmapping.interfaces.mappingservice.IMappingServiceShell)]
java.util.concurrent.TimeoutException
at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)[15:org.apache.aries.blueprint.core:1.4.2]
at org.apache.aries.blueprint.utils.threading.impl.DiscardableRunnable.run(DiscardableRunnable.java:48)[15:org.apache.aries.blueprint.core:1.4.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)[:1.7.0_85]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)[:1.7.0_85]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_85]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_85]
at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
This one appears to emanate from lispflowmapping. It's using
blueprint and appears to import an OSGi
service, IMappingServiceShell, that wasn't found. The default timeout
for blueprint is 5 min but I don't know if the test blocks on this
blueprint container and will fail as a result. Either way it seems
this should be looked at by a lispflowmapping contributor.
There may be other stuff going on. It would be useful to run it by
hand and, when/if it appears stuck, use jstack to get a thread dump.
Also it's hard to tell why the OOM errors are occurring - one of the
tests indicates they started to occur after shutdown was started. For
that it would be useful to get a heap dump via jmap or, better yet,
run the test with the -XX:+HeapDumpOnOutOfMemoryError option enabled
if possible.

On Tue, Oct 13, 2015 at 4:43 PM, Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   Look in at more failing distribution jobs, there is always a
   timeout in the Single Feature, the distribution used to build in
   8 mins while now it takes more than 1 hour.

   BR/Luis


   On Oct 13, 2015, at 12:13 PM, Luis Gomez <ecelgp@...
   <mailto:ecelgp@...>> wrote:

   Hi all,

   I just observed the distribution in master is failing since Oct
   6th [1]. Last errors [2] show memory issues like below but I am
   not sure this is the root cause for this. Can anyone help
   identifying the problem here?

   Thanks/Luis

   [1] https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/
   [2]
   https://jenkins.opendaylight.org/releng/view/yangtools/job/yangtools-distribution-beryllium/444/testReport/

   Exception in thread "qtp1815616686-79" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.HashMap.newKeyIterator(HashMap.java:968)
    at java.util.HashMap$KeySet.iterator(HashMap.java:1002)
    at java.util.HashSet.iterator(HashSet.java:170)
    at sun.nio.ch.Util$2.iterator(Util.java:303)
    at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:600)
    at org.eclipse.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

   Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
    at java.io.BufferedReader.<init>(BufferedReader.java:98)
    at java.io.BufferedReader.<init>(BufferedReader.java:109)
    at java.io.LineNumberReader.<init>(LineNumberReader.java:72)
    at org.apache.felix.utils.properties.Properties$PropertiesReader.<init>(Properties.java:748)
    at org.apache.felix.utils.properties.Properties.loadLayout(Properties.java:352)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:142)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:138)
    at org.apache.felix.utils.properties.Properties.load(Properties.java:122)
    at org.apache.felix.utils.properties.Properties.<init>(Properties.java:107)
    at org.apache.felix.utils.properties.Properties.<init>(Properties.java:96)
    at org.apache.karaf.jaas.modules.properties.AutoEncryptionSupport$1.run(AutoEncryptionSupport.java:63)
   Exception in thread "INT-2,ISPN,rk-c7-merge-6c0-16483" at java.util.TimerThread.mainLoop(Timer.java:555)
    at java.util.TimerThread.run(Timer.java:505)
   java.lang.OutOfMemoryError: Java heap space
    at org.jgroups.util.Util.readLongSequence(Util.java:2235)
    at org.jgroups.util.Digest.readFrom(Digest.java:166)
    at org.jgroups.util.Digest.readFrom(Digest.java:154)
    at org.jgroups.util.Util.readStreamable(Util.java:1105)
    at org.jgroups.util.Util.streamableFromBuffer(Util.java:773)
    at org.jgroups.protocols.pbcast.STABLE.readDigest(STABLE.java:695)
    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:237)
    at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:448)
    at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:636)
    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
    at org.jgroups.protocols.FD.up(FD.java:255)
    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
    at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
    at org.jgroups.protocols.Discovery.up(Discovery.java:379)
    at org.jgroups.protocols.TP.passMessageUp(TP.java:1399)
    at org.jgroups.protocols.TP <http://org.jgroups.protocols.tp/>$4.run(TP.java:1327)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
   Exception in thread "qtp431119273-318" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "qtp431119273-85" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "RMI RenewClean-[10.30.11.239:44444 <http://10.30.11.239:44444/>]" java.lang.OutOfMemoryError: GC overhead limit exceeded
   Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space





   _______________________________________________
   controller-dev mailing list
   controller-dev@...
   <mailto:controller-dev@...>
   https://lists.opendaylight.org/mailman/listinfo/controller-dev





_______________________________________________
controller-dev mailing list
controller-dev@...
https://lists.opendaylight.org/mailman/listinfo/controller-dev




_______________________________________________
controller-dev mailing list
controller-dev@...
https://lists.opendaylight.org/mailman/listinfo/controller-dev