Re: [openflowplugin-dev] openflowjava IOB Exception in netvirt csit


Tomáš Slušný <tomas.slusny@...>
 


That is great then, based on your flow that you found to be causing this and content of that Gerrit patch it seems that after it will be finished (if it isn't yet) it should fix this issue. I was suspecting that there is also possibility that some extension can cause this, but that warning message what you was getting was really misleading.

From user perspective only problem is that flow wouldn't be visible in operational, but it will not affect reconcilation, because it would be in config so it will be reconciled properly.

Regards,
Tomas


Od: Jamo Luhrsen <jluhrsen@...>
Odoslané: sobota, 13. mája 2017 0:24
Komu: Tomáš Slušný; openflowplugin-dev@...; openflowjava-dev@...; netvirt-dev@...
Predmet: Re: [openflowplugin-dev] openflowjava IOB Exception in netvirt csit
 
And, now I see a bug was filed:
https://bugs.opendaylight.org/show_bug.cgi?id=8408

and a patch is in progress:
https://git.opendaylight.org/gerrit/#/c/56731

JamO

On 05/12/2017 11:56 AM, Jamo Luhrsen wrote:
> Tomas,
>
> I think I got it. I was only looking at the flows that were added to our control node
> (this is an openstack environment). I see now, that we are also adding some flows to
> one of our compute nodes. you can see that full flow table here [0], and just look for
> flows with short durations (less than 3s) to know what is new.
>
> looking at that and mucking around with one of my ovs setups, I was able to reproduce
> the IOBE with this flow:
>
> ovs-ofctl -O OpenFlow13 add-flow br-int "in_port=1,ip,action=ct(commit,zone=5002,table=47,nat(src=10.10.10.4)),2"
>
> shows up like this:
>
> OFPST_FLOW reply (OF1.3) (xid=0x2):
>  cookie=0x0, duration=2.328s, table=0, n_packets=0, n_bytes=0, ip,in_port=1
> actions=ct(commit,table=47,zone=5002,nat(src=10.10.10.4)),output:2
>
> IOBE:
>
> 2017-05-12 18:52:31,693 | WARN  | entLoopGroup-7-1 | OFDecoder                        | 295 -
> org.opendaylight.openflowjava.openflow-protocol-impl - 0.9.0.SNAPSHOT | Message deserialization failed
> java.lang.IndexOutOfBoundsException: readerIndex(142) + length(4) exceeds writerIndex(142): UnpooledSlicedByteBuf(ridx: 142,
> widx: 142, cap: 142/142, unwrapped: PooledUnsafeDirectByteBuf(ridx: 2632, widx: 2632, cap: 65536))
>         at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1389)[144:io.netty.buffer:4.1.8.Final]
>         at io.netty.buffer.AbstractByteBuf.readInt(AbstractByteBuf.java:766)[144:io.netty.buffer:4.1.8.Final]
>         at io.netty.buffer.AbstractByteBuf.readUnsignedInt(AbstractByteBuf.java:782)[144:io.netty.buffer:4.1.8.Final]
>
>
> BTW, what is the full effect of this from a user perspective? I think it's just that we
> will not have it in operational, right? Are we going to be in trouble in a reconciliation
> scenario?
>
> Thanks,
> JamO
>
> [0]
> https://logs.opendaylight.org/sandbox/jenkins091/netvirt-csit-1node-openstack-newton-jamo-upstream-stateful-snat-conntrack-carbon/3/archives/log.html.gz#s1-t9-k3-k1-k3-k1-k15-k4
>
>
> On 05/12/2017 05:05 AM, Tomáš Slušný wrote:
>> So I was able to succesfully add both flows you mentioned via RPC, and I do not see any Message derialization failed messages
>> in log. Here is output from dump-flows:
>>
>>
>> $ sudo ovs-ofctl dump-flows s1 -O Openflow13
>> OFPST_FLOW reply (OF1.3) (xid=0x2):
>>  cookie=0x80000006, duration=183.352s, table=21, n_packets=0, n_bytes=0, priority=10,ip,metadata=0x30d52/0xfffffe
>> actions=goto_table:26
>>  cookie=0x80000006, duration=292.436s, table=26, n_packets=0, n_bytes=0, priority=5,ip,metadata=0x30d52/0xfffffe
>> actions=set_field:0x186a9->tun_id,group:1
>>
>> And here is goto_table flow RPC:
>> http://ix.io/tuP
>>
>> And here is set_field flow RPC:
>> http://ix.io/tuQ
>>
>> So maybe there is problem somewhere else?
>>
>> -----------------------------------------------------------------------------------------------------------------------------
>> *Od:* Tomáš Slušný <tomas.slusny@...>
>> *Odoslané:* piatok, 12. mája 2017 10:05
>> *Komu:* Jamo Luhrsen; openflowplugin-dev@...; openflowjava-dev@...;
>> netvirt-dev@...
>> *Predmet:* Re: [openflowplugin-dev] openflowjava IOB Exception in netvirt csit
>> 
>>
>> After looking at it again, it is probably not issue in padding, because it is added in parent class, but it should be some
>> similar issue, so I am trying to fix it.
>>
>>
>> -----------------------------------------------------------------------------------------------------------------------------
>> *Od:* Tomáš Slušný
>> *Odoslané:* piatok, 12. mája 2017 10:01
>> *Komu:* Jamo Luhrsen; openflowplugin-dev@...; openflowjava-dev@...;
>> netvirt-dev@...
>> *Predmet:* Re: [openflowplugin-dev] openflowjava IOB Exception in netvirt csit
>> 
>>
>> Jamo,
>>
>>
>> thanks for your response, it helped me to probably find cause of this issue. Issue is that set-field action deserializer in
>> single layer serialization is not skipping padding at end of set-field action, so then it have some left-over zeros, and
>> based on logs it then tries to deserialize OutputAction (what type is 0) but body does not match, so it fails. I will try to
>> make fix for this as soon as possible. But until then, with single-layer-serialization disabled it will work.
>>
>>
>> Regards,
>>
>> Tomas
>>
>>
>> -----------------------------------------------------------------------------------------------------------------------------
>> *Od:* Jamo Luhrsen <jluhrsen@...>
>> *Odoslané:* piatok, 12. mája 2017 6:42
>> *Komu:* Tomáš Slušný; openflowplugin-dev@...; openflowjava-dev@...;
>> netvirt-dev@...
>> *Predmet:* Re: [openflowplugin-dev] openflowjava IOB Exception in netvirt csit
>> 
>> Tomas,
>>
>> I spent some time trying to figure this out (too much probably...) :) and correct me if
>> I'm wrong but this IOBE is coming from a flow that already exists on the switch and not
>> one we are trying to program. I do notice in a test I did that the IOBE comes every
>> ~3s which I think is our default polling of operational.
>>
>> anyway, I *think* narrowed down the exact test case that introduces this, and then compared
>> the flow table before and after that test case. The only difference are these two flows:
>>
>>  cookie=0x8000006, duration=0.810s, table=21, n_packets=0, n_bytes=0, priority=10,ip,metadata=0x30d52/0xfffffe
>> actions=goto_table:26
>>  cookie=0x8000006, duration=0.810s, table=26, n_packets=0, n_bytes=0, priority=5,ip,metadata=0x30d52/0xfffffe
>> actions=set_field:0x186a9->tun_id,group:225003
>>
>> full flow table here [0]
>>
>> I am only able to check these things with our CSIT stuff. I tried to manually create a flow that
>> looks like the table=26 flow above (via NB rest) but couldn't figure it out. I think my trouble
>> was trying to give a mask to the metadata.
>>
>> Anything help?
>>
>> I didn't try with serialization disabled yet, as that is gonna take some time to figure out
>> how to toggle it with our CSIT.
>>
>> Thanks,
>> JamO
>>
>> [0]
>> https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-upstream-stateful-snat-conntrack-carbon/21/archives/log.html.gz#s1-s1-s3-t9-k3-k1-k1-k1-k15-k4
>>
>> On 05/10/2017 01:04 AM, Tomáš Slušný wrote:
>>> Hi Jamo,
>>>
>>>
>>> can you please try if this issue also occurs with single-layer serialization disabled? Here is link to wiki on how to disable
>>> it: https://wiki.opendaylight.org/view/OpenDaylight_OpenFlow_Plugin:OF13%2B_Single_Layer_Serialization#Enabling.2FDisabling_Single_Layer_Serialization
>>>
>>> Also, can you send dump of flows (or ideally only that single flow that is failing when OFP is tryining to deserialize it)
>>> from ovs?
>>>
>>>
>>> Regards,
>>>
>>> Tomas
>>>
>>>
>>> -----------------------------------------------------------------------------------------------------------------------------
>>> *Od:* Jamo Luhrsen <jluhrsen@...>
>>> *Odoslané:* utorok, 9. mája 2017 21:24
>>> *Komu:* openflowplugin-dev@...; openflowjava-dev@...; netvirt-dev@...
>>> *Predmet:* [openflowplugin-dev] openflowjava IOB Exception in netvirt csit
>>> 
>>> Hi OFP, OFJ, Netvirt,
>>>
>>> I'm debugging some failures in netvirt CSIT and saw an exception I don't remember
>>> seeing before (not that it's new, I just haven't seen it. although I dig at these
>>> things daily)
>>>
>>> 2017-05-09 15:52:45,124 | WARN  | entLoopGroup-7-5 | OFDecoder                        | 268 -
>>> org.opendaylight.openflowjava.openflow-protocol-impl - 0.9.0.SNAPSHOT | Message deserialization failed
>>> java.lang.IndexOutOfBoundsException: readerIndex(166) + length(4) exceeds writerIndex(166): UnpooledSlicedByteBuf(ridx: 166,
>>> widx: 166, cap: 166/166, unwrapped: PooledUnsafeDirectByteBuf(ridx: 13896, widx: 13896, cap: 16384))
>>>         at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1389)[117:io.netty.buffer:4.1.8.Final]
>>>         at io.netty.buffer.AbstractByteBuf.readInt(AbstractByteBuf.java:766)[117:io.netty.buffer:4.1.8.Final]
>>>         at io.netty.buffer.AbstractByteBuf.readUnsignedInt(AbstractByteBuf.java:782)[117:io.netty.buffer:4.1.8.Final]
>>>
>>>
>>> There are a lot of them (~1200) over a 90m system test job.
>>>
>>> I don't know yet if they are related to the failures I am debugging, but something is not right.
>>>
>>> Any comments? Need a bug?
>>>
>>> here is the full karaf log:
>>> https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-1node-openstack-newton-upstream-stateful-snat-conntrack-carbon/21/archives/odl1_karaf.log.gz
>>>
>>> Thanks,
>>> JamO
>>> _______________________________________________
>>> openflowplugin-dev mailing list
>>> openflowplugin-dev@...
>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
>>>
>>> 
>>>
>>> TomášSlušný
>>>
>>> Software Developer
>>>
>>>
>>> Sídlo / Mlynské Nivy 56 / 821 05 Bratislava / Slovakia
>>> R&D centrum / Janka Kráľa 9 /  974 01 Banská Bystrica / Slovakia
>>> +421 911 083 902/ tomas.slusny@...
>>> reception: +421 2 206 65 114 / www.pantheon.tech <http://www.pantheon.tech>
>>>
>>> logo
>>>
>>> 
>>>
>> 
>> Tomáš Slušný
>> Software Developer
>> 
>> PANTHEONtechnologies s.r.o.
>> Janka Kráľa 9, 974 01 Banská Bystrica
>> Slovakia
>> Tel / +421 220 665 111
>> 
>> MAIL /tomas.slusny@...
>> WEB /www.pantheon.tech
>> 
>> 
>> 
>> Tomáš Slušný
>> Software Developer
>> 
>> PANTHEONtechnologies s.r.o.
>> Janka Kráľa 9, 974 01 Banská Bystrica
>> Slovakia
>> Tel / +421 220 665 111
>> 
>> MAIL /tomas.slusny@...
>> WEB /www.pantheon.tech
>> 
>> 
 
Tomáš Slušný
Software Developer
 
PANTHEON technologies s.r.o.
Janka Kráľa 9, 974 01 Banská Bystrica
Slovakia
Tel / +421 220 665 111
 
MAIL / tomas.slusny@...
WEB / www.pantheon.tech
 
 

Join z.archive.openflowjava-dev@lists.opendaylight.org to automatically receive all group messages.