Valentina Krasnobaeva <valentina.krasnobaeva@...>
Hi OVSDB Team, Hi All
Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log, when devstack executes "sudo ovs-vsctl set-manager tcp: 10.100.0.51:6640": 2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/ 10.100.0.51:6640 - R:/ 10.100.0.11:34774] java.lang.NullPointerException: null at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT] at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT] at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482) [63:io.netty.transport:4.1.16.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [60:io.netty.common:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [60:io.netty.common:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [63:io.netty.transport:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [60:io.netty.common:4.1.16.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16.Final] at java.lang.Thread.run(Thread.java:748) [?:?] 2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/ 10.100.0.51:6640 - R:/ 10.100.0.11:34776] ovsdb process: root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy) root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor from its log: 2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp: 10.100.0.51:6640: connecting... 2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp: 10.100.0.51:6640: connected 2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp: 10.100.0.51:6640: connection closed by peer 2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp: 10.100.0.51:6640: connecting... 2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp: 10.100.0.51:6640: connected 2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp: 10.100.0.51:6640: connection closed by peer 2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp: 10.100.0.51:6640: waiting 2 seconds before reconnect 2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp: 10.100.0.51:6640: connecting... 2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp: 10.100.0.51:6640: connected 2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp: 10.100.0.51:6640: connection closed by peer 2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp: 10.100.0.51:6640: waiting 4 seconds before reconnect 2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp: 10.100.0.51:6640: connecting... and of course, there is no br-int on ovsdb: # ovs-vsctl show d3b8068d-fcfb-4654-b6fa-5fea41d9922d Manager "tcp: 10.100.0.51:6640" ovs_version: "2.6.1" I suppose, that this latest patch from ovsdb could provoke the issue:
$ git show 77f0ef93babd commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen) Author: Tim Rozet < trozet@...> Date: Fri Mar 23 14:23:40 2018 -0400 OVSDB-449 Fixes loading keystores dynamically per connection Currently when OVSDB manager is started with SSL, the SSL Context is only fetched once, which inherently means the keystores are only read that one time. This patch changes the behavior so that the SSL Context is fetched everytime a new connection is opened, which in turn will update the keystore data. Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8 Signed-off-by: Tim Rozet < trozet@...> (cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf) Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1.
Thanks for any help,
Regards
|
Yes, that patch was the issue. Here is a fix that should merge soon.
toggle quoted message
Show quoted text
On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@...> wrote: Hi OVSDB Team, Hi All
Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log, when devstack executes "sudo ovs-vsctl set-manager tcp: 10.100.0.51:6640": 2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/ 10.100.0.51:6640 - R:/ 10.100.0.11:34774] java.lang. NullPointerException: null at org.opendaylight.ovsdb.lib. impl.OvsdbConnectionService$2. initChannel( OvsdbConnectionService.java: 325) [400:org.opendaylight.ovsdb. library:1.6.1.SNAPSHOT] at org.opendaylight.ovsdb.lib. impl.OvsdbConnectionService$2. initChannel( OvsdbConnectionService.java: 321) [400:org.opendaylight.ovsdb. library:1.6.1.SNAPSHOT] at io.netty.channel. ChannelInitializer. initChannel( ChannelInitializer.java:113) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. ChannelInitializer. handlerAdded( ChannelInitializer.java:105) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. DefaultChannelPipeline. callHandlerAdded0( DefaultChannelPipeline.java: 606) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. DefaultChannelPipeline.access$ 000(DefaultChannelPipeline. java:46) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. DefaultChannelPipeline$ PendingHandlerAddedTask. execute( DefaultChannelPipeline.java: 1412) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. DefaultChannelPipeline. callHandlerAddedForAllHandlers (DefaultChannelPipeline.java: 1131) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. DefaultChannelPipeline. invokeHandlerAddedIfNeeded( DefaultChannelPipeline.java: 656) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. AbstractChannel$ AbstractUnsafe.register0( AbstractChannel.java:510) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. AbstractChannel$ AbstractUnsafe.access$200( AbstractChannel.java:423) [63:io.netty.transport:4.1.16. Final] at io.netty.channel. AbstractChannel$ AbstractUnsafe$1.run( AbstractChannel.java:482) [63:io.netty.transport:4.1.16. Final] at io.netty.util.concurrent. AbstractEventExecutor. safeExecute( AbstractEventExecutor.java: 163) [60:io.netty.common:4.1.16. Final] at io.netty.util.concurrent. SingleThreadEventExecutor. runAllTasks( SingleThreadEventExecutor. java:403) [60:io.netty.common:4.1.16. Final] at io.netty.channel.nio. NioEventLoop.run(NioEventLoop. java:463) [63:io.netty.transport:4.1.16. Final] at io.netty.util.concurrent. SingleThreadEventExecutor$5. run(SingleThreadEventExecutor. java:858) [60:io.netty.common:4.1.16. Final] at io.netty.util.concurrent. DefaultThreadFactory$ DefaultRunnableDecorator.run( DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16. Final] at java.lang.Thread.run(Thread. java:748) [?:?] 2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/ 10.100.0.51:6640 - R:/ 10.100.0.11:34776] ovsdb process: root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy) root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/ openvswitch/db.sock --private-key=db:Open_vSwitch, SSL,private_key --certificate=db:Open_vSwitch, SSL,certificate --bootstrap-ca-cert=db:Open_ vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/ openvswitch/ovsdb-server.log --pidfile=/var/run/ openvswitch/ovsdb-server.pid --detach --monitor from its log: 2018-03-26T15:09:00.632Z| 00005|reconnect|INFO|tcp: 10.100.0.51:6640: connecting... 2018-03-26T15:09:00.632Z| 00006|reconnect|INFO|tcp: 10.100.0.51:6640: connected 2018-03-26T15:09:00.637Z| 00007|reconnect|INFO|tcp: 10.100.0.51:6640: connection closed by peer 2018-03-26T15:09:01.639Z| 00008|reconnect|INFO|tcp: 10.100.0.51:6640: connecting... 2018-03-26T15:09:01.640Z| 00009|reconnect|INFO|tcp: 10.100.0.51:6640: connected 2018-03-26T15:09:01.642Z| 00010|reconnect|INFO|tcp: 10.100.0.51:6640: connection closed by peer 2018-03-26T15:09:01.642Z| 00011|reconnect|INFO|tcp: 10.100.0.51:6640: waiting 2 seconds before reconnect 2018-03-26T15:09:03.643Z| 00012|reconnect|INFO|tcp: 10.100.0.51:6640: connecting... 2018-03-26T15:09:03.644Z| 00013|reconnect|INFO|tcp: 10.100.0.51:6640: connected 2018-03-26T15:09:03.648Z| 00014|reconnect|INFO|tcp: 10.100.0.51:6640: connection closed by peer 2018-03-26T15:09:03.648Z| 00015|reconnect|INFO|tcp: 10.100.0.51:6640: waiting 4 seconds before reconnect 2018-03-26T15:09:07.648Z| 00016|reconnect|INFO|tcp: 10.100.0.51:6640: connecting... and of course, there is no br-int on ovsdb: # ovs-vsctl show d3b8068d-fcfb-4654-b6fa- 5fea41d9922d Manager "tcp: 10.100.0.51:6640" ovs_version: "2.6.1" I suppose, that this latest patch from ovsdb could provoke the issue:
$ git show 77f0ef93babd commit 77f0ef93babd9a1fd4a7f350867583 0882c374aa (HEAD, origin/stable/oxygen) Author: Tim Rozet < trozet@...> Date: Fri Mar 23 14:23:40 2018 -0400 OVSDB-449 Fixes loading keystores dynamically per connection Currently when OVSDB manager is started with SSL, the SSL Context is only fetched once, which inherently means the keystores are only read that one time. This patch changes the behavior so that the SSL Context is fetched everytime a new connection is opened, which in turn will update the keystore data. Change-Id: Iaafbd34afcb5f4708b48eb3d64eca 14ef0a107e8 Signed-off-by: Tim Rozet < trozet@...> (cherry picked from commit e4a170988175b98e2411782f2a85b6 3edb64d4cf) Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1.
Thanks for any help,
Regards
_______________________________________________
ovsdb-dev mailing list
ovsdb-dev@....org
https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
Vishal Thapar <vishal.thapar@...>
Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing lots of Deadlock Exceptions:
2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl | 360 - org.opendaylight.infrautils.ready-impl - 1.4.0.SNAPSHOT | checkBundleDiagInfos:
Elapsed time 206s, remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0, Starting=0, Active=557, Stopping=0, Failure=0}
2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 | SfcOvsNodeDataListener | 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT | SfcOvsNodeDataListener::add()
bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}]
2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Oh nose
- there are 3 deadlocked threads!! :-(
2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked
thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2):
at sun.misc.Unsafe.park(Native Method)
https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz
Regards,
Vishal.
toggle quoted message
Show quoted text
From: ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...]
On Behalf Of Sam Hague
Sent: 26 March 2018 21:17
To: Valentina Krasnobaeva <valentina.krasnobaeva@...>
Cc: Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>
Subject: Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
Yes, that patch was the issue. Here is a fix that should merge soon.
On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@...> wrote:
Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log,
when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640":
2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/10.100.0.51:6640
- R:/10.100.0.11:34774]
java.lang.NullPointerException: null
at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482) [63:io.netty.transport:4.1.16.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [60:io.netty.common:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [60:io.netty.common:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [63:io.netty.transport:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [60:io.netty.common:4.1.16.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16.Final]
at java.lang.Thread.run(Thread.java:748) [?:?]
2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/10.100.0.51:6640
- R:/10.100.0.11:34776]
ovsdb process:
root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy)
root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate
--bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
from its log:
2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640: waiting 2 seconds before reconnect
2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640: waiting 4 seconds before reconnect
2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
and of course, there is no br-int on ovsdb:
# ovs-vsctl show
d3b8068d-fcfb-4654-b6fa-5fea41d9922d
Manager "tcp:10.100.0.51:6640"
ovs_version: "2.6.1"
I suppose, that this latest patch from ovsdb could provoke the issue:
$ git show 77f0ef93babd
commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen)
Author: Tim Rozet <trozet@...>
Date: Fri Mar 23 14:23:40 2018 -0400
OVSDB-449 Fixes loading keystores dynamically per connection
Currently when OVSDB manager is started with SSL, the SSL Context is
only fetched once, which inherently means the keystores are only read
that one time. This patch changes the behavior so that the SSL Context
is fetched everytime a new connection is opened, which in turn will
update the keystore data.
Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8
Signed-off-by: Tim Rozet <trozet@...>
(cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf)
Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1.
_______________________________________________
ovsdb-dev mailing list
ovsdb-dev@...
https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
Adding Tim.
Are the logs below from the original job valentina mentioned or the latest ? The original failure was from a bad fix which we did fix, so need to understand if this is another issue.
toggle quoted message
Show quoted text
On Mar 26, 2018 11:10 PM, "Vishal Thapar" < vishal.thapar@...> wrote:
Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing lots of Deadlock Exceptions:
2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl | 360 - org.opendaylight.infrautils.ready-impl - 1.4.0.SNAPSHOT | checkBundleDiagInfos:
Elapsed time 206s, remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0, Starting=0, Active=557, Stopping=0, Failure=0}
2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 | SfcOvsNodeDataListener | 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT | SfcOvsNodeDataListener::add()
bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}]
2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Oh nose
- there are 3 deadlocked threads!! :-(
2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked
thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2):
at sun.misc.Unsafe.park(Native Method)
https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz
Regards,
Vishal.
From: ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...]
On Behalf Of Sam Hague
Sent: 26 March 2018 21:17
To: Valentina Krasnobaeva <valentina.krasnobaeva@...>
Cc: Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>
Subject: Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
Yes, that patch was the issue. Here is a fix that should merge soon.
On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@...> wrote:
Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log,
when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640":
2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/10.100.0.51:6640
- R:/10.100.0.11:34774]
java.lang.NullPointerException: null
at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482) [63:io.netty.transport:4.1.16.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [60:io.netty.common:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [60:io.netty.common:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [63:io.netty.transport:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [60:io.netty.common:4.1.16.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16.Final]
at java.lang.Thread.run(Thread.java:748) [?:?]
2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/10.100.0.51:6640
- R:/10.100.0.11:34776]
ovsdb process:
root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy)
root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate
--bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
from its log:
2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640: waiting 2 seconds before reconnect
2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640: waiting 4 seconds before reconnect
2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
and of course, there is no br-int on ovsdb:
# ovs-vsctl show
d3b8068d-fcfb-4654-b6fa-5fea41d9922d
Manager "tcp:10.100.0.51:6640"
ovs_version: "2.6.1"
I suppose, that this latest patch from ovsdb could provoke the issue:
$ git show 77f0ef93babd
commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen)
Author: Tim Rozet <trozet@...>
Date: Fri Mar 23 14:23:40 2018 -0400
OVSDB-449 Fixes loading keystores dynamically per connection
Currently when OVSDB manager is started with SSL, the SSL Context is
only fetched once, which inherently means the keystores are only read
that one time. This patch changes the behavior so that the SSL Context
is fetched everytime a new connection is opened, which in turn will
update the keystore data.
Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8
Signed-off-by: Tim Rozet <trozet@...>
(cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf)
Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1.
_______________________________________________
ovsdb-dev mailing list
ovsdb-dev@...
https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
Michael Vorburger <vorburger@...>
|
Curious but why is the genius 3node bringing in sfc features, is it because this is an integration-all test? Even so, I dont think it makes sense to do this.
toggle quoted message
Show quoted text
On Tue, Mar 27, 2018, 9:10 AM Michael Vorburger < vorburger@...> wrote:
|
Michael Vorburger <vorburger@...>
|
toggle quoted message
Show quoted text
On Mar 27, 2018, at 10:59 AM, Michael Vorburger < vorburger@...> wrote:
_______________________________________________ ovsdb-dev mailing list ovsdb-dev@...https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
Vishal Thapar <vishal.thapar@...>
Hi Sam, Michael,
This is post Tim’s fix and is different issue. Yes, Genius CSIT is using -all, probably need to fix Genius CSIT to not use -all but
Genius only. Issue is coming from some deadlock in bgpcep start/close. Adding more of the stack trace:
2018-03-27T05:52:15,752 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT
| Deadlocked thread stack trace: opendaylight-cluster-data-akka.actor.default-dispatcher-2 locked on org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442 (owned by opendaylight-cluster-data-notification-dispatcher-55):
at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.instantiateServiceInstance(ProgrammingServiceImpl.java:175)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl$$Lambda$1108/2009038775.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1257)
2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-(
2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2):
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.unregisterService(ClusterSingletonServiceGroupImpl.java:393)
at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:151)
at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl$1.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:123)
at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:41)
at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.close(ProgrammingServiceImpl.java:449)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean$PCEPTopologyProviderBeanCSS.close(PCEPTopologyProviderBean.java:190)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean.close(PCEPTopologyProviderBean.java:86)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.closeTopology(PCEPTopologyDeployerImpl.java:151)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.updateTopologyProvider(PCEPTopologyDeployerImpl.java:92)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.lambda$onDataTreeChanged$0(PCEPTopologyDeployerImpl.java:75)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl$$Lambda$1076/552124597.accept(Unknown Source)
So, technically bgpcep is not coming up in -all correctly but since autorelease/distribution check don’t test for functionality it
wasn’t caught till Genius ran a CSIT with -all. Could be some poorly written start/close in BGPCEP also, probably Michael can give more inputs on this. Then we can decide what to raise bug on.
Thanks and Regards,
Vishal.
toggle quoted message
Show quoted text
From: Michael Vorburger [mailto:vorburger@...]
Sent: 27 March 2018 23:30
To: Sam Hague <shague@...>
Cc: Vishal Thapar <vishal.thapar@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...>; Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>;
Tim Rozet <trozet@...>
Subject: Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
On Tue, Mar 27, 2018 at 6:17 PM, Sam Hague <shague@...> wrote:
Curious but why is the genius 3node bringing in sfc features, is it because this is an integration-all test? Even so, I dont think it makes sense to do this.
Not 100% sure what you mean, but yes my understanding is that the distribution integration-all feature in distribution includes *ALL* features.
If you don't think that makes sense, add integration-dev in reply and take it up with the maintainers of integration/distribution..
On Tue, Mar 27, 2018, 9:10 AM Michael Vorburger <vorburger@...> wrote:
On Tue, Mar 27, 2018 at 5:51 PM, Sam Hague <shague@...> wrote:
Are the logs below from the original job valentina mentioned or the latest ? The original failure was from a bad fix which we did fix, so need to understand if this is another issue.
On Mar 26, 2018 11:10 PM, "Vishal Thapar" <vishal.thapar@...> wrote:
Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing lots
of Deadlock Exceptions:
Cool. Could you please open a new JIRA and include the full stack trace it (should) show after (or before?) that "Deadlocked thread stack trace: " ERROR log? From what little I understand so far, this JIRA should probably be openend in
project controller. Although it could be caused by a mistake made in application code. Can you please clarify exactly what caused this to start happening? If it's a particular change, it would be interesting to look more closely at it does anything special
(wrong) re. locking and threading etc.
2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl | 360 - org.opendaylight.infrautils.ready-impl
- 1.4.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 206s, remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0, Starting=0, Active=557, Stopping=0, Failure=0}
2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 | SfcOvsNodeDataListener
| 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT | SfcOvsNodeDataListener::add() bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}]
2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl
- 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-(
2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl
- 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on
java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2):
at sun.misc.Unsafe.park(Native Method)
https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz
Regards,
Vishal.
From:
ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...]
On Behalf Of Sam Hague
Sent: 26 March 2018 21:17
To: Valentina Krasnobaeva <valentina.krasnobaeva@...>
Cc: Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>;
<ovsdb-dev@...> <ovsdb-dev@...>
Subject: Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
Yes, that patch was the issue. Here is a fix that should merge soon.
On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@...> wrote:
Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log,
when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640":
2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/10.100.0.51:6640
- R:/10.100.0.11:34774]
java.lang.NullPointerException: null
at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482) [63:io.netty.transport:4.1.16.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [60:io.netty.common:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [60:io.netty.common:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [63:io.netty.transport:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [60:io.netty.common:4.1.16.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16.Final]
at java.lang.Thread.run(Thread.java:748) [?:?]
2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/10.100.0.51:6640
- R:/10.100.0.11:34776]
ovsdb process:
root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy)
root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate
--bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
from its log:
2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640: waiting 2 seconds before reconnect
2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640: waiting 4 seconds before reconnect
2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
and of course, there is no br-int on ovsdb:
# ovs-vsctl show
d3b8068d-fcfb-4654-b6fa-5fea41d9922d
Manager "tcp:10.100.0.51:6640"
ovs_version: "2.6.1"
I suppose, that this latest patch from ovsdb could provoke the issue:
$ git show 77f0ef93babd
commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen)
Author: Tim Rozet <trozet@...>
Date: Fri Mar 23 14:23:40 2018 -0400
OVSDB-449 Fixes loading keystores dynamically per connection
Currently when OVSDB manager is started with SSL, the SSL Context is
only fetched once, which inherently means the keystores are only read
that one time. This patch changes the behavior so that the SSL Context
is fetched everytime a new connection is opened, which in turn will
update the keystore data.
Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8
Signed-off-by: Tim Rozet <trozet@...>
(cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf)
Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1.
_______________________________________________
ovsdb-dev mailing list
ovsdb-dev@...
https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
Vishal Thapar <vishal.thapar@...>
Missed Luis’ mail who suggests not using -all for 3 node tests. I think we should prioritize that ASAP.
Regards,
Vishal.
toggle quoted message
Show quoted text
From: ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...]
On Behalf Of Vishal Thapar
Sent: 28 March 2018 10:50
To: Michael Vorburger <vorburger@...>; Sam Hague <shague@...>
Cc: Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>; Tim Rozet <trozet@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...>
Subject: Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
Hi Sam, Michael,
This is post Tim’s fix and is different issue. Yes, Genius CSIT is using -all, probably need to fix Genius CSIT to not use -all but
Genius only. Issue is coming from some deadlock in bgpcep start/close. Adding more of the stack trace:
2018-03-27T05:52:15,752 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT
| Deadlocked thread stack trace: opendaylight-cluster-data-akka.actor.default-dispatcher-2 locked on
org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442 (owned by opendaylight-cluster-data-notification-dispatcher-55):
at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.instantiateServiceInstance(ProgrammingServiceImpl.java:175)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl$$Lambda$1108/2009038775.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1257)
2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-(
2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2):
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.unregisterService(ClusterSingletonServiceGroupImpl.java:393)
at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:151)
at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl$1.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:123)
at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:41)
at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.close(ProgrammingServiceImpl.java:449)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean$PCEPTopologyProviderBeanCSS.close(PCEPTopologyProviderBean.java:190)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean.close(PCEPTopologyProviderBean.java:86)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.closeTopology(PCEPTopologyDeployerImpl.java:151)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.updateTopologyProvider(PCEPTopologyDeployerImpl.java:92)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.lambda$onDataTreeChanged$0(PCEPTopologyDeployerImpl.java:75)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl$$Lambda$1076/552124597.accept(Unknown Source)
So, technically bgpcep is not coming up in -all correctly but since autorelease/distribution check don’t test for functionality it
wasn’t caught till Genius ran a CSIT with -all. Could be some poorly written start/close in BGPCEP also, probably Michael can give more inputs on this. Then we can decide what to raise bug on.
Thanks and Regards,
Vishal.
From: Michael Vorburger [mailto:vorburger@...]
Sent: 27 March 2018 23:30
To: Sam Hague <shague@...>
Cc: Vishal Thapar <vishal.thapar@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...>; Kiran N Upadhyaya <kiran.n.upadhyaya@...>;
Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>;
Tim Rozet <trozet@...>
Subject: Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
On Tue, Mar 27, 2018 at 6:17 PM, Sam Hague <shague@...> wrote:
Curious but why is the genius 3node bringing in sfc features, is it because this is an integration-all test? Even so, I dont think it makes sense to do this.
Not 100% sure what you mean, but yes my understanding is that the distribution integration-all feature in distribution includes *ALL* features.
If you don't think that makes sense, add integration-dev in reply and take it up with the maintainers of integration/distribution..
On Tue, Mar 27, 2018, 9:10 AM Michael Vorburger <vorburger@...> wrote:
On Tue, Mar 27, 2018 at 5:51 PM, Sam Hague <shague@...> wrote:
Are the logs below from the original job valentina mentioned or the latest ? The original failure was from a bad fix which we did fix, so need to understand if this is another issue.
On Mar 26, 2018 11:10 PM, "Vishal Thapar" <vishal.thapar@...> wrote:
Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing lots
of Deadlock Exceptions:
Cool. Could you please open a new JIRA and include the full stack trace it (should) show after (or before?) that "Deadlocked thread stack trace: " ERROR log? From what little I understand so far, this JIRA should probably be openend in
project controller. Although it could be caused by a mistake made in application code. Can you please clarify exactly what caused this to start happening? If it's a particular change, it would be interesting to look more closely at it does anything special
(wrong) re. locking and threading etc.
2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl | 360 - org.opendaylight.infrautils.ready-impl
- 1.4.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 206s, remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0, Starting=0, Active=557, Stopping=0, Failure=0}
2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 | SfcOvsNodeDataListener
| 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT | SfcOvsNodeDataListener::add() bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}]
2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl
- 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-(
2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl
- 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on
java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2):
at sun.misc.Unsafe.park(Native Method)
https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz
Regards,
Vishal.
From:
ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...]
On Behalf Of Sam Hague
Sent: 26 March 2018 21:17
To: Valentina Krasnobaeva <valentina.krasnobaeva@...>
Cc: Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>;
<ovsdb-dev@...> <ovsdb-dev@...>
Subject: Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
Yes, that patch was the issue. Here is a fix that should merge soon.
On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@...> wrote:
Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log,
when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640":
2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/10.100.0.51:6640
- R:/10.100.0.11:34774]
java.lang.NullPointerException: null
at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) [63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482) [63:io.netty.transport:4.1.16.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [60:io.netty.common:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [60:io.netty.common:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [63:io.netty.transport:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [60:io.netty.common:4.1.16.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16.Final]
at java.lang.Thread.run(Thread.java:748) [?:?]
2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/10.100.0.51:6640
- R:/10.100.0.11:34776]
ovsdb process:
root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy)
root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate
--bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
from its log:
2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640: waiting 2 seconds before reconnect
2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640: connected
2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640: connection closed by peer
2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640: waiting 4 seconds before reconnect
2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640: connecting...
and of course, there is no br-int on ovsdb:
# ovs-vsctl show
d3b8068d-fcfb-4654-b6fa-5fea41d9922d
Manager "tcp:10.100.0.51:6640"
ovs_version: "2.6.1"
I suppose, that this latest patch from ovsdb could provoke the issue:
$ git show 77f0ef93babd
commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen)
Author: Tim Rozet <trozet@...>
Date: Fri Mar 23 14:23:40 2018 -0400
OVSDB-449 Fixes loading keystores dynamically per connection
Currently when OVSDB manager is started with SSL, the SSL Context is
only fetched once, which inherently means the keystores are only read
that one time. This patch changes the behavior so that the SSL Context
is fetched everytime a new connection is opened, which in turn will
update the keystore data.
Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8
Signed-off-by: Tim Rozet <trozet@...>
(cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf)
Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1.
_______________________________________________
ovsdb-dev mailing list
ovsdb-dev@...
https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
Jamo Luhrsen <jluhrsen@...>
Isn't this a new problem just uncovered/introduced? If so, then doesn't that mean we probably found it easier *because* we are running the -all- job.
JamO
toggle quoted message
Show quoted text
On 3/27/18 10:21 PM, Vishal Thapar wrote: Missed Luis’ mail who suggests not using -all for 3 node tests. I think we should prioritize that ASAP. Regards, Vishal. *From:*ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...] *On Behalf Of *Vishal Thapar *Sent:* 28 March 2018 10:50 *To:* Michael Vorburger <vorburger@...>; Sam Hague <shague@...> *Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>; Tim Rozet <trozet@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS Hi Sam, Michael, This is post Tim’s fix and is different issue. Yes, Genius CSIT is using -all, probably need to fix Genius CSIT to not use -all but Genius only. Issue is coming from some deadlock in bgpcep start/close. Adding more of the stack trace: 2018-03-27T05:52:15,752 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-akka.actor.default-dispatcher-2 locked on org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442 <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442> (owned by opendaylight-cluster-data-notification-dispatcher-55): at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.instantiateServiceInstance(ProgrammingServiceImpl.java:175) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl$$Lambda$1108/2009038775.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1257) 2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-( 2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86> (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2): at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.unregisterService(ClusterSingletonServiceGroupImpl.java:393) at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:151) at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl$1.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:123) at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:41) at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.close(ProgrammingServiceImpl.java:449) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean$PCEPTopologyProviderBeanCSS.close(PCEPTopologyProviderBean.java:190) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean.close(PCEPTopologyProviderBean.java:86) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.closeTopology(PCEPTopologyDeployerImpl.java:151) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.updateTopologyProvider(PCEPTopologyDeployerImpl.java:92) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.lambda$onDataTreeChanged$0(PCEPTopologyDeployerImpl.java:75) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl$$Lambda$1076/552124597.accept(Unknown Source) So, technically bgpcep is not coming up in -all correctly but since autorelease/distribution check don’t test for functionality it wasn’t caught till Genius ran a CSIT with -all. Could be some poorly written start/close in BGPCEP also, probably Michael can give more inputs on this. Then we can decide what to raise bug on. Thanks and Regards, Vishal. *From:*Michael Vorburger [mailto:vorburger@...] *Sent:* 27 March 2018 23:30 *To:* Sam Hague <shague@... <mailto:shague@...>> *Cc:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>>; Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>; Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>; Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>; <ovsdb-dev@... <mailto:ovsdb-dev@...>> <ovsdb-dev@... <mailto:ovsdb-dev@...>>; Tim Rozet <trozet@... <mailto:trozet@...>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS On Tue, Mar 27, 2018 at 6:17 PM, Sam Hague <shague@... <mailto:shague@...>> wrote: Curious but why is the genius 3node bringing in sfc features, is it because this is an integration-all test? Even so, I dont think it makes sense to do this. Not 100% sure what you mean, but yes my understanding is that the distribution integration-all feature in distribution includes *ALL* features. If you don't think that makes sense, add integration-dev in reply and take it up with the maintainers of integration/distribution.. On Tue, Mar 27, 2018, 9:10 AM Michael Vorburger <vorburger@... <mailto:vorburger@...>> wrote: On Tue, Mar 27, 2018 at 5:51 PM, Sam Hague <shague@... <mailto:shague@...>> wrote: Adding Tim. Are the logs below from the original job valentina mentioned or the latest ? The original failure was from a bad fix which we did fix, so need to understand if this is another issue. On Mar 26, 2018 11:10 PM, "Vishal Thapar" <vishal.thapar@... <mailto:vishal.thapar@...>> wrote: Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing lots of Deadlock Exceptions: Cool. Could you please open a new JIRA and include the full stack trace it (should) show after (or before?) that "Deadlocked thread stack trace: " ERROR log? From what little I understand so far, this JIRA should probably be openend in project controller. Although it could be caused by a mistake made in application code. Can you please clarify exactly what caused this to start happening? If it's a particular change, it would be interesting to look more closely at it does anything special (wrong) re. locking and threading etc. Tx, M. -- Michael Vorburger, Red Hat vorburger@... <mailto:vorburger@...> | IRC: vorburger @freenode | ~ = http://vorburger.ch <http://vorburger.ch/> 2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl | 360 - org.opendaylight.infrautils.ready-impl - 1.4.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 206s, remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0, Starting=0, Active=557, Stopping=0, Failure=0} 2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 | SfcOvsNodeDataListener | 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT | SfcOvsNodeDataListener::add() bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}] 2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-( 2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86> (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2): at sun.misc.Unsafe.park(Native Method) https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz Regards, Vishal. *From:*ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...> [mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>] *On Behalf Of *Sam Hague *Sent:* 26 March 2018 21:17 *To:* Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>> *Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>; Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>; <ovsdb-dev@... <mailto:ovsdb-dev@...>> <ovsdb-dev@... <mailto:ovsdb-dev@...>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS Yes, that patch was the issue. Here is a fix that should merge soon. https://git.opendaylight.org/gerrit/#/c/70106/ On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>> wrote: Hi OVSDB Team, Hi All Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log, when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640 <http://10.100.0.51:6640>": 2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/10.100.0.51:6640 <http://10.100.0.51:6640> - R:/10.100.0.11:34774 <http://10.100.0.11:34774>] java.lang.NullPointerException: null at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT] at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT] at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482) [63:io.netty.transport:4.1.16.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [60:io.netty.common:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [60:io.netty.common:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [63:io.netty.transport:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [60:io.netty.common:4.1.16.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16.Final] at java.lang.Thread.run(Thread.java:748) [?:?] 2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/10.100.0.51:6640 <http://10.100.0.51:6640> - R:/10.100.0.11:34776 <http://10.100.0.11:34776>] ovsdb process: root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy) root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor from its log: 2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connected 2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connected 2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: waiting 2 seconds before reconnect 2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connected 2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: waiting 4 seconds before reconnect 2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connecting... and of course, there is no br-int on ovsdb: # ovs-vsctl show d3b8068d-fcfb-4654-b6fa-5fea41d9922d Manager "tcp:10.100.0.51:6640 <http://10.100.0.51:6640>" ovs_version: "2.6.1" I suppose, that this latest patch from ovsdb could provoke the issue: $ git show 77f0ef93babd commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen) Author: Tim Rozet <trozet@... <mailto:trozet@...>> Date: Fri Mar 23 14:23:40 2018 -0400 OVSDB-449 Fixes loading keystores dynamically per connection Currently when OVSDB manager is started with SSL, the SSL Context is only fetched once, which inherently means the keystores are only read that one time. This patch changes the behavior so that the SSL Context is fetched everytime a new connection is opened, which in turn will update the keystore data. Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8 Signed-off-by: Tim Rozet <trozet@... <mailto:trozet@...>> (cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf) Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1. Thanks for any help, Regards _______________________________________________ ovsdb-dev mailing list ovsdb-dev@... <mailto:ovsdb-dev@...> https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev _______________________________________________ ovsdb-dev mailing list ovsdb-dev@... https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
|
Vishal Thapar <vishal.thapar@...>
Hi Jamo,
It did help uncover the problem, but I’d say main question is when we say compatible-with-all, does it inherently imply compatibility
in a cluster setup too? Not sure how many of compatible with all even have a cluster job running. Maybe we want to make 3 node cluster csit a requirement for compatible with all?
Sam,
I believe Genius 3-node is just re-using the 3node csit and probably something needs to be tweaked to not use all but -only. We don’t
need and and shouldn’t be using -all.
Regards,
Vishal.
toggle quoted message
Show quoted text
From: Sam Hague [mailto:shague@...]
Sent: 29 March 2018 01:41
To: Jamo Luhrsen <jluhrsen@...>
Cc: Vishal Thapar <vishal.thapar@...>; Michael Vorburger <vorburger@...>; Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>;
Tim Rozet <trozet@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...>
Subject: Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
On Wed, Mar 28, 2018 at 12:26 PM, Jamo Luhrsen <jluhrsen@...> wrote:
Isn't this a new problem just uncovered/introduced? If so, then doesn't that mean we
probably found it easier *because* we are running the -all- job.
I don't know if it has anything to do with the -all job - my original questions were just why are
we using -all and why is sfc in a -all distro
- We know 3node has issues so as Luis mentioned, so I thought no one used -all in the 3node jobs
- sfc ovs is firing in this job, but I don't know how anything using ovs can truly be independent from
other bundles. Maybe this sfcovsdblistener is agnostic but it still seems problematic. genius
is supposed to allow coexistence so maybe this is OK. But moving away from -all distro would
also make this issue go away.
JamO
On 3/27/18 10:21 PM, Vishal Thapar wrote:
Missed Luis’ mail who suggests not using -all for 3 node tests. I think we should prioritize that ASAP.
Regards,
Vishal.
*From:*ovsdb-dev-bounces@... [mailto:ovsdb-dev-bounces@...] *On
Behalf Of *Vishal Thapar
*Sent:* 28 March 2018 10:50
*To:* Michael Vorburger <vorburger@...>; Sam Hague <shague@...>
*Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...>
<ovsdb-dev@...>; Tim Rozet <trozet@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...>
*Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
Hi Sam, Michael,
This is post Tim’s fix and is different issue. Yes, Genius CSIT is using -all, probably need to fix Genius CSIT to not use -all but Genius only. Issue is coming from some deadlock in bgpcep start/close. Adding more of the stack trace:
2018-03-27T05:52:15,752 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 -
org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-akka.actor.default-dispatcher-2 locked on
org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442 <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442>
(owned by opendaylight-cluster-data-notification-dispatcher-55):
at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.instantiateServiceInstance(ProgrammingServiceImpl.java:175)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl$$Lambda$1108/2009038775.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1257)
2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 -
org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-(
2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 -
org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on
java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86>
(owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2):
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.unregisterService(ClusterSingletonServiceGroupImpl.java:393)
at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:151)
at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl$1.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:123)
at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:41)
at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.close(ProgrammingServiceImpl.java:449)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean$PCEPTopologyProviderBeanCSS.close(PCEPTopologyProviderBean.java:190)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean.close(PCEPTopologyProviderBean.java:86)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.closeTopology(PCEPTopologyDeployerImpl.java:151)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.updateTopologyProvider(PCEPTopologyDeployerImpl.java:92)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.lambda$onDataTreeChanged$0(PCEPTopologyDeployerImpl.java:75)
at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl$$Lambda$1076/552124597.accept(Unknown Source)
So, technically bgpcep is not coming up in -all correctly but since autorelease/distribution check don’t test for functionality it wasn’t caught till Genius ran a CSIT with -all. Could be some poorly written start/close in BGPCEP also, probably Michael can
give more inputs on this. Then we can decide what to raise bug on.
Thanks and Regards,
Vishal.
*From:*Michael Vorburger [mailto:vorburger@...]
*Sent:* 27 March 2018 23:30
*To:* Sam Hague <shague@... <mailto:shague@...>>
*Cc:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>>; Valentina Krasnobaeva <valentina.krasnobaeva@...
<mailto:valentina.krasnobaeva@...>>; Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>;
Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>; <ovsdb-dev@...
<mailto:ovsdb-dev@...>> <ovsdb-dev@... <mailto:ovsdb-dev@...>>;
Tim Rozet <trozet@... <mailto:trozet@...>>
*Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
On Tue, Mar 27, 2018 at 6:17 PM, Sam Hague <shague@... <mailto:shague@...>> wrote:
Curious but why is the genius 3node bringing in sfc features, is it because this is an integration-all test? Even
so, I dont think it makes sense to do this.
Not 100% sure what you mean, but yes my understanding is that the distribution integration-all feature in distribution includes *ALL* features.
If you don't think that makes sense, add integration-dev in reply and take it up with the maintainers of integration/distribution..
On Tue, Mar 27, 2018, 9:10 AM Michael Vorburger <vorburger@... <mailto:vorburger@...>> wrote:
On Tue, Mar 27, 2018 at 5:51 PM, Sam Hague <shague@... <mailto:shague@...>> wrote:
Adding Tim.
Are the logs below from the original job valentina mentioned or the latest ? The original failure was from a
bad fix which we did fix, so need to understand if this is another issue.
On Mar 26, 2018 11:10 PM, "Vishal Thapar" <vishal.thapar@... <mailto:vishal.thapar@...>>
wrote:
Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing lots of
Deadlock Exceptions:
Cool. Could you please open a new JIRA and include the full stack trace it (should) show after (or before?) that
"Deadlocked thread stack trace: " ERROR log? From what little I understand so far, this JIRA should probably be
openend in project controller. Although it could be caused by a mistake made in application code. Can you please
clarify exactly what caused this to start happening? If it's a particular change, it would be interesting to
look more closely at it does anything special (wrong) re. locking and threading etc.
Tx,
M.
--
Michael Vorburger, Red Hat
vorburger@... <mailto:vorburger@...> | IRC: vorburger @freenode | ~ =
http://vorburger.ch
<http://vorburger.ch/>
2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl |
360 - org.opendaylight.infrautils.ready-impl - 1.4.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 206s,
remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0,
Starting=0, Active=557, Stopping=0, Failure=0}
2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 |
SfcOvsNodeDataListener | 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT | SfcOvsNodeDataListener::add()
bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}]
2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 -
org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked
threads!! :-(
2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 -
org.opendaylight.infrautils.metrics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace:
opendaylight-cluster-data-notification-dispatcher-55 locked on
java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86
<mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86> (owned by
opendaylight-cluster-data-akka.actor.default-dispatcher-2):
at sun.misc.Unsafe.park(Native Method)
https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz
Regards,
Vishal.
*From:*ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>
[mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>]
*On
Behalf Of *Sam Hague
*Sent:* 26 March 2018 21:17
*To:* Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>
*Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>;
Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>;
<ovsdb-dev@... <mailto:ovsdb-dev@...>>
<ovsdb-dev@... <mailto:ovsdb-dev@...>>
*Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest
OXYGEN/Fluorine) in attempt to connect to OVS
Yes, that patch was the issue. Here is a fix that should merge soon.
https://git.opendaylight.org/gerrit/#/c/70106/
On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@...
<mailto:valentina.krasnobaeva@...>> wrote:
Hi OVSDB Team, Hi All
Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I
always see this error in ovsdb log,
when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640 <http://10.100.0.51:6640>":
2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 -
io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c,
L:/10.100.0.51:6640 <http://10.100.0.51:6640> - R:/10.100.0.11:34774 <http://10.100.0.11:34774>]
java.lang.NullPointerException: null
at
org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at
org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.library:1.6.1.SNAPSHOT]
at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113)
[63:io.netty.transport:4.1.16.Final]
at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105)
[63:io.netty.transport:4.1.16.Final]
at
io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606)
[63:io.netty.transport:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46)
[63:io.netty.transport:4.1.16.Final]
at
io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412)
[63:io.netty.transport:4.1.16.Final]
at
io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131)
[63:io.netty.transport:4.1.16.Final]
at
io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656)
[63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510)
[63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
[63:io.netty.transport:4.1.16.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482)
[63:io.netty.transport:4.1.16.Final]
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
[60:io.netty.common:4.1.16.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
[60:io.netty.common:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
[63:io.netty.transport:4.1.16.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
[60:io.netty.common:4.1.16.Final]
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
[60:io.netty.common:4.1.16.Final]
at java.lang.Thread.run(Thread.java:748) [?:?]
2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 -
io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8,
L:/10.100.0.51:6640 <http://10.100.0.51:6640> - R:/10.100.0.11:34776
<http://10.100.0.11:34776>]
ovsdb process:
root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312
(healthy)
root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server
/etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info
--remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key
--certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert
--no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log
--pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
from its log:
2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
connecting...
2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connected
2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
connection closed by peer
2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
connecting...
2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connected
2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
connection closed by peer
2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
waiting 2 seconds before reconnect
2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
connecting...
2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>: connected
2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
connection closed by peer
2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
waiting 4 seconds before reconnect
2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640>:
connecting...
and of course, there is no br-int on ovsdb:
# ovs-vsctl show
d3b8068d-fcfb-4654-b6fa-5fea41d9922d
Manager "tcp:10.100.0.51:6640 <http://10.100.0.51:6640>"
ovs_version: "2.6.1"
I suppose, that this latest patch from ovsdb could provoke the issue:
$ git show 77f0ef93babd
commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen)
Author: Tim Rozet <trozet@... <mailto:trozet@...>>
Date: Fri Mar 23 14:23:40 2018 -0400
OVSDB-449 Fixes loading keystores dynamically per connection
Currently when OVSDB manager is started with SSL, the SSL Context is
only fetched once, which inherently means the keystores are only read
that one time. This patch changes the behavior so that the SSL Context
is fetched everytime a new connection is opened, which in turn will
update the keystore data.
Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8
Signed-off-by: Tim Rozet <trozet@... <mailto:trozet@...>>
(cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf)
Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1.
Thanks for any help,
Regards
_______________________________________________
ovsdb-dev mailing list
ovsdb-dev@... <mailto:ovsdb-dev@...>
https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
_______________________________________________
ovsdb-dev mailing list
ovsdb-dev@...
https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
Jamo Luhrsen <jluhrsen@...>
Since the beginning the idea of compatible-with-all is just that. If your feature is in that list, then we should be able to have it installed and do anything we want with any other feature that is installed.
at some point, there were features that were found to be compatible in a single node installation, but not in a clustered installation. That was messy and it sounds like OFP is still working around that by not installing all in 3node jobs.
personally, I prefer that we keep the noise and let these things happen so bugs can be found and fixed. I also know that others prefer it the other way where we only bring in the exact things we care about. Doing the latter means we will miss bugs.
JamO
toggle quoted message
Show quoted text
On 3/28/18 10:11 PM, Vishal Thapar wrote: Hi Jamo, It did help uncover the problem, but I’d say main question is when we say compatible-with-all, does it inherently imply compatibility in a cluster setup too? Not sure how many of compatible with all even have a cluster job running. Maybe we want to make 3 node cluster csit a requirement for compatible with all? Sam, I believe Genius 3-node is just re-using the 3node csit and probably something needs to be tweaked to not use all but -only. We don’t need and and shouldn’t be using -all. Regards, Vishal. *From:*Sam Hague [mailto:shague@...] *Sent:* 29 March 2018 01:41 *To:* Jamo Luhrsen <jluhrsen@...> *Cc:* Vishal Thapar <vishal.thapar@...>; Michael Vorburger <vorburger@...>; Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>; Tim Rozet <trozet@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS On Wed, Mar 28, 2018 at 12:26 PM, Jamo Luhrsen <jluhrsen@... <mailto:jluhrsen@...>> wrote: Isn't this a new problem just uncovered/introduced? If so, then doesn't that mean we probably found it easier *because* we are running the -all- job. I don't know if it has anything to do with the -all job - my original questions were just why are we using -all and why is sfc in a -all distro - We know 3node has issues so as Luis mentioned, so I thought no one used -all in the 3node jobs - sfc ovs is firing in this job, but I don't know how anything using ovs can truly be independent from other bundles. Maybe this sfcovsdblistener is agnostic but it still seems problematic. genius is supposed to allow coexistence so maybe this is OK. But moving away from -all distro would also make this issue go away. JamO On 3/27/18 10:21 PM, Vishal Thapar wrote: Missed Luis’ mail who suggests not using -all for 3 node tests. I think we should prioritize that ASAP. Regards, Vishal. *From:*ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...> [mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>] *On Behalf Of *Vishal Thapar *Sent:* 28 March 2018 10:50 *To:* Michael Vorburger <vorburger@... <mailto:vorburger@...>>; Sam Hague <shague@... <mailto:shague@...>> *Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>; Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>; <ovsdb-dev@... <mailto:ovsdb-dev@...>> <ovsdb-dev@... <mailto:ovsdb-dev@...>>; Tim Rozet <trozet@... <mailto:trozet@...>>; Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS Hi Sam, Michael, This is post Tim’s fix and is different issue. Yes, Genius CSIT is using -all, probably need to fix Genius CSIT to not use -all but Genius only. Issue is coming from some deadlock in bgpcep start/close. Adding more of the stack trace: 2018-03-27T05:52:15,752 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-akka.actor.default-dispatcher-2 locked on org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442 <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442> <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442 <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442>> (owned by opendaylight-cluster-data-notification-dispatcher-55): at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.instantiateServiceInstance(ProgrammingServiceImpl.java:175) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl$$Lambda$1108/2009038775.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1257) 2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-( 2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86> <mailto:java.util.concurrent.locks.ReentrantLock <mailto:java.util.concurrent.locks.ReentrantLock>$FairSync@2da5ef86> (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2): at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.unregisterService(ClusterSingletonServiceGroupImpl.java:393) at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:151) at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl$1.re <http://1.re>moveRegistration(AbstractClusterSingletonServiceProviderImpl.java:123) at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:41) at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.close(ProgrammingServiceImpl.java:449) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean$PCEPTopologyProviderBeanCSS.close(PCEPTopologyProviderBean.java:190) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean.close(PCEPTopologyProviderBean.java:86) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.closeTopology(PCEPTopologyDeployerImpl.java:151) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.updateTopologyProvider(PCEPTopologyDeployerImpl.java:92) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.lambda$onDataTreeChanged$0(PCEPTopologyDeployerImpl.java:75) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl$$Lambda$1076/552124597.accept(Unknown Source) So, technically bgpcep is not coming up in -all correctly but since autorelease/distribution check don’t test for functionality it wasn’t caught till Genius ran a CSIT with -all. Could be some poorly written start/close in BGPCEP also, probably Michael can give more inputs on this. Then we can decide what to raise bug on. Thanks and Regards, Vishal. *From:*Michael Vorburger [mailto:vorburger@... <mailto:vorburger@...>] *Sent:* 27 March 2018 23:30 *To:* Sam Hague <shague@... <mailto:shague@...> <mailto:shague@... <mailto:shague@...>>> *Cc:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...> <mailto:vishal.thapar@... <mailto:vishal.thapar@...>>>; Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...> <mailto:valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>>; Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...> <mailto:kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>>; Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...> <mailto:guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>>; <ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>> <ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>>; Tim Rozet <trozet@... <mailto:trozet@...> <mailto:trozet@... <mailto:trozet@...>>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS On Tue, Mar 27, 2018 at 6:17 PM, Sam Hague <shague@... <mailto:shague@...> <mailto:shague@... <mailto:shague@...>>> wrote: Curious but why is the genius 3node bringing in sfc features, is it because this is an integration-all test? Even so, I dont think it makes sense to do this. Not 100% sure what you mean, but yes my understanding is that the distribution integration-all feature in distribution includes *ALL* features. If you don't think that makes sense, add integration-dev in reply and take it up with the maintainers of integration/distribution.. On Tue, Mar 27, 2018, 9:10 AM Michael Vorburger <vorburger@... <mailto:vorburger@...> <mailto:vorburger@... <mailto:vorburger@...>>> wrote: On Tue, Mar 27, 2018 at 5:51 PM, Sam Hague <shague@... <mailto:shague@...> <mailto:shague@... <mailto:shague@...>>> wrote: Adding Tim. Are the logs below from the original job valentina mentioned or the latest ? The original failure was from a bad fix which we did fix, so need to understand if this is another issue. On Mar 26, 2018 11:10 PM, "Vishal Thapar" <vishal.thapar@... <mailto:vishal.thapar@...> <mailto:vishal.thapar@... <mailto:vishal.thapar@...>>> wrote: Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing lots of Deadlock Exceptions: Cool. Could you please open a new JIRA and include the full stack trace it (should) show after (or before?) that "Deadlocked thread stack trace: " ERROR log? From what little I understand so far, this JIRA should probably be openend in project controller. Although it could be caused by a mistake made in application code. Can you please clarify exactly what caused this to start happening? If it's a particular change, it would be interesting to look more closely at it does anything special (wrong) re. locking and threading etc. Tx, M. -- Michael Vorburger, Red Hat vorburger@... <mailto:vorburger@...> <mailto:vorburger@... <mailto:vorburger@...>> | IRC: vorburger @freenode | ~ = http://vorburger.ch <http://vorburger.ch/> 2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl | 360 - org.opendaylight.infrautils.re <http://org.opendaylight.infrautils.re>ady-impl - 1.4.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 206s, remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0, Starting=0, Active=557, Stopping=0, Failure=0} 2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 | SfcOvsNodeDataListener | 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT | SfcOvsNodeDataListener::add() bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}] 2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-( 2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86> <mailto:java.util.concurrent.locks.ReentrantLock <mailto:java.util.concurrent.locks.ReentrantLock>$FairSync@2da5ef86> (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2): at sun.misc.Unsafe.park(Native Method) https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz Regards, Vishal. *From:*ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...> <mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>> [mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...> <mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>>] *On Behalf Of *Sam Hague *Sent:* 26 March 2018 21:17 *To:* Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...> <mailto:valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>> *Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...> <mailto:kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>>; Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...> <mailto:guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>>; <ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>> <ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS Yes, that patch was the issue. Here is a fix that should merge soon. https://git.opendaylight.org/gerrit/#/c/70106/ On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...> <mailto:valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>> wrote: Hi OVSDB Team, Hi All Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log, when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>": 2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640> - R:/10.100.0.11:34774 <http://10.100.0.11:34774> <http://10.100.0.11:34774>] java.lang.NullPointerException: null at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.li <http://org.opendaylight.ovsdb.li>brary:1.6.1.SNAPSHOT] at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.li <http://org.opendaylight.ovsdb.li>brary:1.6.1.SNAPSHOT] at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482) [63:io.netty.transport:4.1.16.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [60:io.netty.common:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [60:io.netty.common:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [63:io.netty.transport:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [60:io.netty.common:4.1.16.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16.Final] at java.lang.Thread.run(Thread.java:748) [?:?] 2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640> - R:/10.100.0.11:34776 <http://10.100.0.11:34776> <http://10.100.0.11:34776>] ovsdb process: root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy) root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor from its log: 2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected 2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected 2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: waiting 2 seconds before reconnect 2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected 2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: waiting 4 seconds before reconnect 2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connecting... and of course, there is no br-int on ovsdb: # ovs-vsctl show d3b8068d-fcfb-4654-b6fa-5fea41d9922d Manager "tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>" ovs_version: "2.6.1" I suppose, that this latest patch from ovsdb could provoke the issue: $ git show 77f0ef93babd commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen) Author: Tim Rozet <trozet@... <mailto:trozet@...> <mailto:trozet@... <mailto:trozet@...>>> Date: Fri Mar 23 14:23:40 2018 -0400 OVSDB-449 Fixes loading keystores dynamically per connection Currently when OVSDB manager is started with SSL, the SSL Context is only fetched once, which inherently means the keystores are only read that one time. This patch changes the behavior so that the SSL Context is fetched everytime a new connection is opened, which in turn will update the keystore data. Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8 Signed-off-by: Tim Rozet <trozet@... <mailto:trozet@...> <mailto:trozet@... <mailto:trozet@...>>> (cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf) Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1. Thanks for any help, Regards _______________________________________________ ovsdb-dev mailing list ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>> https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev _______________________________________________ ovsdb-dev mailing list ovsdb-dev@... <mailto:ovsdb-dev@...> https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
On Mar 29, 2018, at 11:18 AM, Jamo Luhrsen <jluhrsen@...> wrote:
Since the beginning the idea of compatible-with-all is just that. If your feature is in that list, then we should be able to have it installed and do anything we want with any other feature that is installed.
at some point, there were features that were found to be compatible in a single node installation, but not in a clustered installation. That was messy and it sounds like OFP is still working around that by not installing all in 3node jobs.
personally, I prefer that we keep the noise and let these things happen so bugs can be found and fixed. I also know that others prefer it the other way where we only bring in the exact things we care about. Doing the latter means we will miss bugs. Fair enough, any project is free to test whatever it feels. For OFP I do not think we are in a point the cluster is stable with just OFP feature so no point to add more noise. JamO
On 3/28/18 10:11 PM, Vishal Thapar wrote:
Hi Jamo, It did help uncover the problem, but I’d say main question is when we say compatible-with-all, does it inherently imply compatibility in a cluster setup too? Not sure how many of compatible with all even have a cluster job running. Maybe we want to make 3 node cluster csit a requirement for compatible with all? Sam, I believe Genius 3-node is just re-using the 3node csit and probably something needs to be tweaked to not use all but -only. We don’t need and and shouldn’t be using -all. Regards, Vishal. *From:*Sam Hague [mailto:shague@...] *Sent:* 29 March 2018 01:41 *To:* Jamo Luhrsen <jluhrsen@...> *Cc:* Vishal Thapar <vishal.thapar@...>; Michael Vorburger <vorburger@...>; Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>; Tim Rozet <trozet@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS On Wed, Mar 28, 2018 at 12:26 PM, Jamo Luhrsen <jluhrsen@... <mailto:jluhrsen@...>> wrote: Isn't this a new problem just uncovered/introduced? If so, then doesn't that mean we probably found it easier *because* we are running the -all- job. I don't know if it has anything to do with the -all job - my original questions were just why are we using -all and why is sfc in a -all distro - We know 3node has issues so as Luis mentioned, so I thought no one used -all in the 3node jobs - sfc ovs is firing in this job, but I don't know how anything using ovs can truly be independent from other bundles. Maybe this sfcovsdblistener is agnostic but it still seems problematic. genius is supposed to allow coexistence so maybe this is OK. But moving away from -all distro would also make this issue go away. JamO On 3/27/18 10:21 PM, Vishal Thapar wrote: Missed Luis’ mail who suggests not using -all for 3 node tests. I think we should prioritize that ASAP. Regards, Vishal. *From:*ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...> [mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>] *On Behalf Of *Vishal Thapar *Sent:* 28 March 2018 10:50 *To:* Michael Vorburger <vorburger@... <mailto:vorburger@...>>; Sam Hague <shague@... <mailto:shague@...>> *Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>; Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>; <ovsdb-dev@... <mailto:ovsdb-dev@...>> <ovsdb-dev@... <mailto:ovsdb-dev@...>>; Tim Rozet <trozet@... <mailto:trozet@...>>; Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS Hi Sam, Michael, This is post Tim’s fix and is different issue. Yes, Genius CSIT is using -all, probably need to fix Genius CSIT to not use -all but Genius only. Issue is coming from some deadlock in bgpcep start/close. Adding more of the stack trace: 2018-03-27T05:52:15,752 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-akka.actor.default-dispatcher-2 locked on org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442 <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442> <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442 <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442>> (owned by opendaylight-cluster-data-notification-dispatcher-55): at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.instantiateServiceInstance(ProgrammingServiceImpl.java:175) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl$$Lambda$1108/2009038775.accept(Unknown Source) at java.util.ArrayList.forEach(ArrayList.java:1257) 2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-( 2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86> <mailto:java.util.concurrent.locks.ReentrantLock <mailto:java.util.concurrent.locks.ReentrantLock>$FairSync@2da5ef86> (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2): at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.unregisterService(ClusterSingletonServiceGroupImpl.java:393) at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:151) at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl$1.re <http://1.re>moveRegistration(AbstractClusterSingletonServiceProviderImpl.java:123) at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:41) at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.close(ProgrammingServiceImpl.java:449) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean$PCEPTopologyProviderBeanCSS.close(PCEPTopologyProviderBean.java:190) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean.close(PCEPTopologyProviderBean.java:86) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.closeTopology(PCEPTopologyDeployerImpl.java:151) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.updateTopologyProvider(PCEPTopologyDeployerImpl.java:92) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.lambda$onDataTreeChanged$0(PCEPTopologyDeployerImpl.java:75) at org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl$$Lambda$1076/552124597.accept(Unknown Source) So, technically bgpcep is not coming up in -all correctly but since autorelease/distribution check don’t test for functionality it wasn’t caught till Genius ran a CSIT with -all. Could be some poorly written start/close in BGPCEP also, probably Michael can give more inputs on this. Then we can decide what to raise bug on. Thanks and Regards, Vishal. *From:*Michael Vorburger [mailto:vorburger@... <mailto:vorburger@...>] *Sent:* 27 March 2018 23:30 *To:* Sam Hague <shague@... <mailto:shague@...> <mailto:shague@... <mailto:shague@...>>> *Cc:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...> <mailto:vishal.thapar@... <mailto:vishal.thapar@...>>>; Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...> <mailto:valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>>; Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...> <mailto:kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>>; Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...> <mailto:guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>>; <ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>> <ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>>; Tim Rozet <trozet@... <mailto:trozet@...> <mailto:trozet@... <mailto:trozet@...>>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS On Tue, Mar 27, 2018 at 6:17 PM, Sam Hague <shague@... <mailto:shague@...> <mailto:shague@... <mailto:shague@...>>> wrote: Curious but why is the genius 3node bringing in sfc features, is it because this is an integration-all test? Even so, I dont think it makes sense to do this. Not 100% sure what you mean, but yes my understanding is that the distribution integration-all feature in distribution includes *ALL* features. If you don't think that makes sense, add integration-dev in reply and take it up with the maintainers of integration/distribution.. On Tue, Mar 27, 2018, 9:10 AM Michael Vorburger <vorburger@... <mailto:vorburger@...> <mailto:vorburger@... <mailto:vorburger@...>>> wrote: On Tue, Mar 27, 2018 at 5:51 PM, Sam Hague <shague@... <mailto:shague@...> <mailto:shague@... <mailto:shague@...>>> wrote: Adding Tim. Are the logs below from the original job valentina mentioned or the latest ? The original failure was from a bad fix which we did fix, so need to understand if this is another issue. On Mar 26, 2018 11:10 PM, "Vishal Thapar" <vishal.thapar@... <mailto:vishal.thapar@...> <mailto:vishal.thapar@... <mailto:vishal.thapar@...>>> wrote: Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing lots of Deadlock Exceptions: Cool. Could you please open a new JIRA and include the full stack trace it (should) show after (or before?) that "Deadlocked thread stack trace: " ERROR log? From what little I understand so far, this JIRA should probably be openend in project controller. Although it could be caused by a mistake made in application code. Can you please clarify exactly what caused this to start happening? If it's a particular change, it would be interesting to look more closely at it does anything special (wrong) re. locking and threading etc. Tx, M. -- Michael Vorburger, Red Hat vorburger@... <mailto:vorburger@...> <mailto:vorburger@... <mailto:vorburger@...>> | IRC: vorburger @freenode | ~ = http://vorburger.ch <http://vorburger.ch/> 2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] | SystemReadyImpl | 360 - org.opendaylight.infrautils.re <http://org.opendaylight.infrautils.re>ady-impl - 1.4.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 206s, remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0, Starting=0, Active=557, Stopping=0, Failure=0} 2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 | SfcOvsNodeDataListener | 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT | SfcOvsNodeDataListener::add() bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}] 2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked threads!! :-( 2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86 <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86> <mailto:java.util.concurrent.locks.ReentrantLock <mailto:java.util.concurrent.locks.ReentrantLock>$FairSync@2da5ef86> (owned by opendaylight-cluster-data-akka.actor.default-dispatcher-2): at sun.misc.Unsafe.park(Native Method) https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz Regards, Vishal. *From:*ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...> <mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>> [mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...> <mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>>] *On Behalf Of *Sam Hague *Sent:* 26 March 2018 21:17 *To:* Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...> <mailto:valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>> *Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...> <mailto:kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>>; Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...> <mailto:guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>>; <ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>> <ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS Yes, that patch was the issue. Here is a fix that should merge soon. https://git.opendaylight.org/gerrit/#/c/70106/ On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...> <mailto:valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>> wrote: Hi OVSDB Team, Hi All Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I always see this error in ovsdb log, when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>": 2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 - io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c, L:/10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640> - R:/10.100.0.11:34774 <http://10.100.0.11:34774> <http://10.100.0.11:34774>] java.lang.NullPointerException: null at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325) [400:org.opendaylight.ovsdb.li <http://org.opendaylight.ovsdb.li>brary:1.6.1.SNAPSHOT] at org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321) [400:org.opendaylight.ovsdb.li <http://org.opendaylight.ovsdb.li>brary:1.6.1.SNAPSHOT] at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) [63:io.netty.transport:4.1.16.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482) [63:io.netty.transport:4.1.16.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [60:io.netty.common:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) [60:io.netty.common:4.1.16.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) [63:io.netty.transport:4.1.16.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [60:io.netty.common:4.1.16.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [60:io.netty.common:4.1.16.Final] at java.lang.Thread.run(Thread.java:748) [?:?] 2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 - io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id: 0x1c79d0b8, L:/10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640> - R:/10.100.0.11:34776 <http://10.100.0.11:34776> <http://10.100.0.11:34776>] ovsdb process: root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring pid 16312 (healthy) root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor from its log: 2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected 2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected 2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: waiting 2 seconds before reconnect 2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connecting... 2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected 2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connection closed by peer 2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: waiting 4 seconds before reconnect 2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connecting... and of course, there is no br-int on ovsdb: # ovs-vsctl show d3b8068d-fcfb-4654-b6fa-5fea41d9922d Manager "tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>" ovs_version: "2.6.1" I suppose, that this latest patch from ovsdb could provoke the issue: $ git show 77f0ef93babd commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen) Author: Tim Rozet <trozet@... <mailto:trozet@...> <mailto:trozet@... <mailto:trozet@...>>> Date: Fri Mar 23 14:23:40 2018 -0400 OVSDB-449 Fixes loading keystores dynamically per connection Currently when OVSDB manager is started with SSL, the SSL Context is only fetched once, which inherently means the keystores are only read that one time. This patch changes the behavior so that the SSL Context is fetched everytime a new connection is opened, which in turn will update the keystore data. Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8 Signed-off-by: Tim Rozet <trozet@... <mailto:trozet@...> <mailto:trozet@... <mailto:trozet@...>>> (cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf) Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1. Thanks for any help, Regards _______________________________________________ ovsdb-dev mailing list ovsdb-dev@... <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>> https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev _______________________________________________ ovsdb-dev mailing list ovsdb-dev@... <mailto:ovsdb-dev@...> https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev _______________________________________________ ovsdb-dev mailing list ovsdb-dev@... https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|
On Thu, Mar 29, 2018, 1:49 PM Luis Gomez < ecelgp@...> wrote:
> On Mar 29, 2018, at 11:18 AM, Jamo Luhrsen <jluhrsen@...> wrote:
>
> Since the beginning the idea of compatible-with-all is just that. If your feature
> is in that list, then we should be able to have it installed and do anything we
> want with any other feature that is installed.
>
> at some point, there were features that were found to be compatible in a single
> node installation, but not in a clustered installation. That was messy and it
> sounds like OFP is still working around that by not installing all in 3node jobs.
>
> personally, I prefer that we keep the noise and let these things happen so bugs
> can be found and fixed. I also know that others prefer it the other way where
> we only bring in the exact things we care about. Doing the latter means we will
> miss bugs.
Fair enough, any project is free to test whatever it feels. For OFP I do not think we are in a point the cluster is stable with just OFP feature so no point to add more noise.
Same thinking, too much noise. Once we get basic 3node working then we can add in all if it makes sense, but right now it doesn't make sense to test a huge blob of unrelated features for use cases that may never happen.
>
> JamO
>
> On 3/28/18 10:11 PM, Vishal Thapar wrote:
>> Hi Jamo,
>> It did help uncover the problem, but I’d say main question is when we say compatible-with-all, does it inherently imply compatibility in a cluster setup too? Not sure how many of compatible with all even have a cluster job running. Maybe we want to make 3 node cluster csit a requirement for compatible with all?
>> Sam,
>> I believe Genius 3-node is just re-using the 3node csit and probably something needs to be tweaked to not use all but -only. We don’t need and and shouldn’t be using -all.
>> Regards,
>> Vishal.
>> *From:*Sam Hague [mailto:shague@...]
>> *Sent:* 29 March 2018 01:41
>> *To:* Jamo Luhrsen <jluhrsen@...>
>> *Cc:* Vishal Thapar <vishal.thapar@...>; Michael Vorburger <vorburger@...>; Kiran N Upadhyaya <kiran.n.upadhyaya@...>; Guillaume Gaudonville <guillaume.gaudonville@...>; <ovsdb-dev@...> <ovsdb-dev@...>; Tim Rozet <trozet@...>; Valentina Krasnobaeva <valentina.krasnobaeva@...>
>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest OXYGEN/Fluorine) in attempt to connect to OVS
>> On Wed, Mar 28, 2018 at 12:26 PM, Jamo Luhrsen <jluhrsen@... <mailto:jluhrsen@...>> wrote:
>> Isn't this a new problem just uncovered/introduced? If so, then doesn't that mean we
>> probably found it easier *because* we are running the -all- job.
>> I don't know if it has anything to do with the -all job - my original questions were just why are
>> we using -all and why is sfc in a -all distro
>> - We know 3node has issues so as Luis mentioned, so I thought no one used -all in the 3node jobs
>> - sfc ovs is firing in this job, but I don't know how anything using ovs can truly be independent from
>> other bundles. Maybe this sfcovsdblistener is agnostic but it still seems problematic. genius
>> is supposed to allow coexistence so maybe this is OK. But moving away from -all distro would
>> also make this issue go away.
>> JamO
>> On 3/27/18 10:21 PM, Vishal Thapar wrote:
>> Missed Luis’ mail who suggests not using -all for 3 node tests. I think we should prioritize that ASAP.
>> Regards,
>> Vishal.
>> *From:*ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>
>> [mailto:ovsdb-dev-bounces@... <mailto:ovsdb-dev-bounces@...>] *On Behalf
>> Of *Vishal Thapar
>> *Sent:* 28 March 2018 10:50
>> *To:* Michael Vorburger <vorburger@... <mailto:vorburger@...>>; Sam Hague <shague@...
>> <mailto:shague@...>>
>> *Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>; Guillaume
>> Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>;
>> <ovsdb-dev@... <mailto:ovsdb-dev@...>> <ovsdb-dev@...
>> <mailto:ovsdb-dev@...>>; Tim Rozet <trozet@... <mailto:trozet@...>>; Valentina
>> Krasnobaeva <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>
>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest
>> OXYGEN/Fluorine) in attempt to connect to OVS
>> Hi Sam, Michael,
>> This is post Tim’s fix and is different issue. Yes, Genius CSIT is using -all, probably need to fix Genius CSIT
>> to not use -all but Genius only. Issue is coming from some deadlock in bgpcep start/close. Adding more of the
>> stack trace:
>> 2018-03-27T05:52:15,752 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 -
>> org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked
>> thread stack trace: opendaylight-cluster-data-akka.actor.default-dispatcher-2 locked on
>> org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442
>> <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442>
>> <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442
>> <mailto:org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl@23621442>> (owned by
>> opendaylight-cluster-data-notification-dispatcher-55):
>> at
>> org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.instantiateServiceInstance(ProgrammingServiceImpl.java:175)
>> at
>> org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.lambda$startServices$0(ClusterSingletonServiceGroupImpl.java:648)
>> at
>> org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl$$Lambda$1108/2009038775.accept(Unknown Source)
>> at java.util.ArrayList.forEach(ArrayList.java:1257)
>> 2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 -
>> org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Oh nose -
>> there are 3 deadlocked threads!! :-(
>> 2018-03-27T05:52:16,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 -
>> org.opendaylight.infrautils.me <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked
>> thread stack trace: opendaylight-cluster-data-notification-dispatcher-55 locked on
>> java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86
>> <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86>
>> <mailto:java.util.concurrent.locks.ReentrantLock
>> <mailto:java.util.concurrent.locks.ReentrantLock>$FairSync@2da5ef86> (owned by
>> opendaylight-cluster-data-akka.actor.default-dispatcher-2):
>> at sun.misc.Unsafe.park(Native Method)
>> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>> at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
>> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>> at
>> org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.unregisterService(ClusterSingletonServiceGroupImpl.java:393)
>> at
>> org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.removeRegistration(AbstractClusterSingletonServiceProviderImpl.java:151)
>> at org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl$1.re
>> <http://1.re>moveRegistration(AbstractClusterSingletonServiceProviderImpl.java:123)
>> at org.opendaylight.yangtools.concepts.AbstractRegistration.close(AbstractRegistration.java:41)
>> at org.opendaylight.bgpcep.programming.impl.ProgrammingServiceImpl.close(ProgrammingServiceImpl.java:449)
>> at
>> org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean$PCEPTopologyProviderBeanCSS.close(PCEPTopologyProviderBean.java:190)
>> at
>> org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyProviderBean.close(PCEPTopologyProviderBean.java:86)
>> at
>> org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.closeTopology(PCEPTopologyDeployerImpl.java:151)
>> at
>> org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.updateTopologyProvider(PCEPTopologyDeployerImpl.java:92)
>> at
>> org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl.lambda$onDataTreeChanged$0(PCEPTopologyDeployerImpl.java:75)
>> at
>> org.opendaylight.bgpcep.pcep.topology.provider.config.PCEPTopologyDeployerImpl$$Lambda$1076/552124597.accept(Unknown
>> Source)
>> So, technically bgpcep is not coming up in -all correctly but since autorelease/distribution check don’t test
>> for functionality it wasn’t caught till Genius ran a CSIT with -all. Could be some poorly written start/close in
>> BGPCEP also, probably Michael can give more inputs on this. Then we can decide what to raise bug on.
>> Thanks and Regards,
>> Vishal.
>> *From:*Michael Vorburger [mailto:vorburger@... <mailto:vorburger@...>]
>> *Sent:* 27 March 2018 23:30
>> *To:* Sam Hague <shague@... <mailto:shague@...> <mailto:shague@... <mailto:shague@...>>>
>> *Cc:* Vishal Thapar <vishal.thapar@... <mailto:vishal.thapar@...>
>> <mailto:vishal.thapar@... <mailto:vishal.thapar@...>>>; Valentina Krasnobaeva
>> <valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>
>> <mailto:valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>>; Kiran N Upadhyaya
>> <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...> <mailto:kiran.n.upadhyaya@...
>> <mailto:kiran.n.upadhyaya@...>>>; Guillaume Gaudonville <guillaume.gaudonville@...
>> <mailto:guillaume.gaudonville@...> <mailto:guillaume.gaudonville@...
>> <mailto:guillaume.gaudonville@...>>>; <ovsdb-dev@...
>> <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@...
>> <mailto:ovsdb-dev@...>>> <ovsdb-dev@...
>> <mailto:ovsdb-dev@...> <mailto:ovsdb-dev@...
>> <mailto:ovsdb-dev@...>>>; Tim Rozet <trozet@... <mailto:trozet@...>
>> <mailto:trozet@... <mailto:trozet@...>>>
>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest
>> OXYGEN/Fluorine) in attempt to connect to OVS
>> On Tue, Mar 27, 2018 at 6:17 PM, Sam Hague <shague@... <mailto:shague@...>
>> <mailto:shague@... <mailto:shague@...>>> wrote:
>> Curious but why is the genius 3node bringing in sfc features, is it because this is an integration-all
>> test? Even
>> so, I dont think it makes sense to do this.
>> Not 100% sure what you mean, but yes my understanding is that the distribution integration-all feature in
>> distribution includes *ALL* features.
>> If you don't think that makes sense, add integration-dev in reply and take it up with the maintainers of
>> integration/distribution..
>> On Tue, Mar 27, 2018, 9:10 AM Michael Vorburger <vorburger@... <mailto:vorburger@...>
>> <mailto:vorburger@... <mailto:vorburger@...>>> wrote:
>> On Tue, Mar 27, 2018 at 5:51 PM, Sam Hague <shague@... <mailto:shague@...>
>> <mailto:shague@... <mailto:shague@...>>> wrote:
>> Adding Tim.
>> Are the logs below from the original job valentina mentioned or the latest ? The original failure
>> was from a
>> bad fix which we did fix, so need to understand if this is another issue.
>> On Mar 26, 2018 11:10 PM, "Vishal Thapar" <vishal.thapar@...
>> <mailto:vishal.thapar@...> <mailto:vishal.thapar@... <mailto:vishal.thapar@...>>>
>> wrote:
>> Don’t see the socket issue, but Sathwick is now seeing some other issue. InfraUtils is throwing
>> lots of
>> Deadlock Exceptions:
>> Cool. Could you please open a new JIRA and include the full stack trace it (should) show after (or
>> before?) that
>> "Deadlocked thread stack trace: " ERROR log? From what little I understand so far, this JIRA should
>> probably be
>> openend in project controller. Although it could be caused by a mistake made in application code. Can
>> you please
>> clarify exactly what caused this to start happening? If it's a particular change, it would be
>> interesting to
>> look more closely at it does anything special (wrong) re. locking and threading etc.
>> Tx,
>> M.
>> --
>> Michael Vorburger, Red Hat
>> vorburger@... <mailto:vorburger@...> <mailto:vorburger@... <mailto:vorburger@...>> |
>> IRC: vorburger @freenode | ~ = http://vorburger.ch
>> <http://vorburger.ch/>
>> 2018-03-27T05:53:29,133 | INFO | awaitility[checkBundleDiagInfos] |
>> SystemReadyImpl |
>> 360 - org.opendaylight.infrautils.re <http://org.opendaylight.infrautils.re>ady-impl -
>> 1.4.0.SNAPSHOT | checkBundleDiagInfos: Elapsed time 206s,
>> remaining time 93s, diag: Booting {Installed=0, Resolved=7, Unknown=0, GracePeriod=1, Waiting=0,
>> Starting=0, Active=557, Stopping=0, Failure=0}
>> 2018-03-27T05:53:29,245 | INFO | opendaylight-cluster-data-notification-dispatcher-66 |
>> SfcOvsNodeDataListener | 468 - org.opendaylight.sfc.ovs - 0.8.0.SNAPSHOT |
>> SfcOvsNodeDataListener::add()
>> bridge name [OvsdbBridgeName{_value=BR1}] DPID [DatapathId{_value=00:00:f6:3e:dc:fc:7b:4e}]
>> 2018-03-27T05:53:29,252 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me
>> <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Oh nose - there are 3 deadlocked
>> threads!! :-(
>> 2018-03-27T05:53:29,253 | ERROR | infrautils.metrics.ThreadsWatcher-0 | ThreadsWatcher | 358 - org.opendaylight.infrautils.me
>> <http://org.opendaylight.infrautils.me>trics-impl - 1.4.0.SNAPSHOT | Deadlocked thread stack trace:
>> opendaylight-cluster-data-notification-dispatcher-55 locked on
>> java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86
>> <mailto:java.util.concurrent.locks.ReentrantLock$FairSync@2da5ef86>
>> <mailto:java.util.concurrent.locks.ReentrantLock
>> <mailto:java.util.concurrent.locks.ReentrantLock>$FairSync@2da5ef86> (owned by
>> opendaylight-cluster-data-akka.actor.default-dispatcher-2):
>> at sun.misc.Unsafe.park(Native Method)
>> https://logs.opendaylight.org/sandbox/vex-yul-odl-jenkins-2/genius-csit-3node-sathwikgate-all-fluorine/5/odl1_karaf.log.gz
>> Regards,
>> Vishal.
>> *From:*ovsdb-dev-bounces@...
>> <mailto:ovsdb-dev-bounces@...> <mailto:ovsdb-dev-bounces@...
>> <mailto:ovsdb-dev-bounces@...>>
>> [mailto:ovsdb-dev-bounces@...
>> <mailto:ovsdb-dev-bounces@...> <mailto:ovsdb-dev-bounces@...
>> <mailto:ovsdb-dev-bounces@...>>] *On
>> Behalf Of *Sam Hague
>> *Sent:* 26 March 2018 21:17
>> *To:* Valentina Krasnobaeva <valentina.krasnobaeva@...
>> <mailto:valentina.krasnobaeva@...> <mailto:valentina.krasnobaeva@...
>> <mailto:valentina.krasnobaeva@...>>>
>> *Cc:* Kiran N Upadhyaya <kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>
>> <mailto:kiran.n.upadhyaya@... <mailto:kiran.n.upadhyaya@...>>>;
>> Guillaume Gaudonville <guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>
>> <mailto:guillaume.gaudonville@... <mailto:guillaume.gaudonville@...>>>;
>> <ovsdb-dev@... <mailto:ovsdb-dev@...>
>> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>>
>> <ovsdb-dev@... <mailto:ovsdb-dev@...>
>> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>>
>> *Subject:* Re: [ovsdb-dev] [ovsdb] NullPointerException in OvsdbConnectionService.java:325 (latest
>> OXYGEN/Fluorine) in attempt to connect to OVS
>> Yes, that patch was the issue. Here is a fix that should merge soon.
>> https://git.opendaylight.org/gerrit/#/c/70106/
>> On Mon, Mar 26, 2018 at 8:17 AM, Valentina Krasnobaeva <valentina.krasnobaeva@...
>> <mailto:valentina.krasnobaeva@...>
>> <mailto:valentina.krasnobaeva@... <mailto:valentina.krasnobaeva@...>>> wrote:
>> Hi OVSDB Team, Hi All
>> Since today's morning with latest builds of OXYGEN and Fluorine (all branches were rebased) I
>> always see this error in ovsdb log,
>> when devstack executes "sudo ovs-vsctl set-manager tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>":
>> 2018-03-26T16:37:14,404 | WARN | nioEventLoopGroup-5-1 | ChannelInitializer | 60 -
>> io.netty.common - 4.1.16.Final | Failed to initialize a channel. Closing: [id: 0x2976135c,
>> L:/10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640> -
>> R:/10.100.0.11:34774 <http://10.100.0.11:34774> <http://10.100.0.11:34774>]
>> java.lang.NullPointerException: null
>> at
>> org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:325)
>> [400:org.opendaylight.ovsdb.li <http://org.opendaylight.ovsdb.li>brary:1.6.1.SNAPSHOT]
>> at
>> org.opendaylight.ovsdb.lib.impl.OvsdbConnectionService$2.initChannel(OvsdbConnectionService.java:321)
>> [400:org.opendaylight.ovsdb.li <http://org.opendaylight.ovsdb.li>brary:1.6.1.SNAPSHOT]
>> at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:113)
>> [63:io.netty.transport:4.1.16.Final]
>> at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:105)
>> [63:io.netty.transport:4.1.16.Final]
>> at
>> io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:606)
>> [63:io.netty.transport:4.1.16.Final]
>> at io.netty.channel.DefaultChannelPipeline.access$000(DefaultChannelPipeline.java:46)
>> [63:io.netty.transport:4.1.16.Final]
>> at
>> io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1412)
>> [63:io.netty.transport:4.1.16.Final]
>> at
>> io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1131)
>> [63:io.netty.transport:4.1.16.Final]
>> at
>> io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:656)
>> [63:io.netty.transport:4.1.16.Final]
>> at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:510)
>> [63:io.netty.transport:4.1.16.Final]
>> at
>> io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
>> [63:io.netty.transport:4.1.16.Final]
>> at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482)
>> [63:io.netty.transport:4.1.16.Final]
>> at
>> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>> [60:io.netty.common:4.1.16.Final]
>> at
>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
>> [60:io.netty.common:4.1.16.Final]
>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
>> [63:io.netty.transport:4.1.16.Final]
>> at
>> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>> [60:io.netty.common:4.1.16.Final]
>> at
>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
>> [60:io.netty.common:4.1.16.Final]
>> at java.lang.Thread.run(Thread.java:748) [?:?]
>> 2018-03-26T16:37:15,410 | INFO | nioEventLoopGroup-4-1 | LoggingHandler | 60 -
>> io.netty.common - 4.1.16.Final | [id: 0x57fd99d7, L:/0:0:0:0:0:0:0:0:6640] READ: [id:
>> 0x1c79d0b8,
>> L:/10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640> -
>> R:/10.100.0.11:34776 <http://10.100.0.11:34776> <http://10.100.0.11:34776>]
>> ovsdb process:
>> root 16311 0.0 0.0 28768 2152 ? S<s 17:05 0:00 ovsdb-server: monitoring
>> pid 16312
>> (healthy)
>> root 16312 0.0 0.0 29040 5304 ? S< 17:05 0:00 \_ ovsdb-server
>> /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info
>> --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key
>> --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert
>> --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log
>> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
>> from its log:
>> 2018-03-26T15:09:00.632Z|00005|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> connecting...
>> 2018-03-26T15:09:00.632Z|00006|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected
>> 2018-03-26T15:09:00.637Z|00007|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> connection closed by peer
>> 2018-03-26T15:09:01.639Z|00008|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> connecting...
>> 2018-03-26T15:09:01.640Z|00009|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected
>> 2018-03-26T15:09:01.642Z|00010|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> connection closed by peer
>> 2018-03-26T15:09:01.642Z|00011|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> waiting 2 seconds before reconnect
>> 2018-03-26T15:09:03.643Z|00012|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> connecting...
>> 2018-03-26T15:09:03.644Z|00013|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>: connected
>> 2018-03-26T15:09:03.648Z|00014|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> connection closed by peer
>> 2018-03-26T15:09:03.648Z|00015|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> waiting 4 seconds before reconnect
>> 2018-03-26T15:09:07.648Z|00016|reconnect|INFO|tcp:10.100.0.51:6640
>> <http://10.100.0.51:6640> <http://10.100.0.51:6640>:
>> connecting...
>> and of course, there is no br-int on ovsdb:
>> # ovs-vsctl show
>> d3b8068d-fcfb-4654-b6fa-5fea41d9922d
>> Manager "tcp:10.100.0.51:6640 <http://10.100.0.51:6640> <http://10.100.0.51:6640>"
>> ovs_version: "2.6.1"
>> I suppose, that this latest patch from ovsdb could provoke the issue:
>> $ git show 77f0ef93babd
>> commit 77f0ef93babd9a1fd4a7f3508675830882c374aa (HEAD, origin/stable/oxygen)
>> Author: Tim Rozet <trozet@... <mailto:trozet@...> <mailto:trozet@...
>> <mailto:trozet@...>>>
>> Date: Fri Mar 23 14:23:40 2018 -0400
>> OVSDB-449 Fixes loading keystores dynamically per connection
>> Currently when OVSDB manager is started with SSL, the SSL Context is
>> only fetched once, which inherently means the keystores are only read
>> that one time. This patch changes the behavior so that the SSL Context
>> is fetched everytime a new connection is opened, which in turn will
>> update the keystore data.
>> Change-Id: Iaafbd34afcb5f4708b48eb3d64eca14ef0a107e8
>> Signed-off-by: Tim Rozet <trozet@... <mailto:trozet@...>
>> <mailto:trozet@... <mailto:trozet@...>>>
>> (cherry picked from commit e4a170988175b98e2411782f2a85b63edb64d4cf)
>> Has anyone already had this issue ? If so, please, share you workaround. I use openvswitch 2.6.1-1.
>> Thanks for any help,
>> Regards
>> _______________________________________________
>> ovsdb-dev mailing list
>> ovsdb-dev@... <mailto:ovsdb-dev@...>
>> <mailto:ovsdb-dev@... <mailto:ovsdb-dev@...>>
>> https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
>> _______________________________________________
>> ovsdb-dev mailing list
>> ovsdb-dev@... <mailto:ovsdb-dev@...>
>> https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
> _______________________________________________
> ovsdb-dev mailing list
> ovsdb-dev@...
> https://lists.opendaylight.org/mailman/listinfo/ovsdb-dev
|