toggle quoted messageShow quoted text
Sound good to me Vishal. Anybody planning to push a patch for this ?
On Wed, Jan 18, 2017 at 12:45 AM, Vishal Thapar <vishal.thapar@...>
Yes, you’re right. I was under wrong impression that inactivity_probe is configured as other_config so will be covered with existing
yang. Just confirmed, we *will* need yang changes in OVSDB too. While at it, will also add max_backoff. I think we will want this in boron too?
Any objections to the plan?
From: Sela, Guy [mailto:guy.sela@...]
Sent: 18 January 2017 14:02
I mean the API that OVSDB is exposing to its clients (In this case Netvirt).
It seems to me that OVSDB didn’t model the inactivity_probe in the YANG model itself.
After it will be modeled, OVSDB should exposed a way to define in while calling addBridge.
From the Netvirt perspective, we need an ability to configure it in a configuration file.
Aha, you mean the API that OVSDB is using? That is more of a convenience and we can add without that too. I’ll add code for it. If needed, we can add util accordingly
to OVSDB. Either ways, the fix will be going in Netvirt, OVSDB change may or may not be needed.
So it seems there should be 2 bugs?
OVSDB needs to expose this in its addBridge interface and NETVIRT should allow to configure it.
I couldn’t find how to configure it in OvsdbBridgeAugmentationBuilder and not in SouthboundUtils.addBridge
If this knob is needed for Controller’s probe timer bug should be on netvirt as netvirt is adding controller. In case of Manager, it should be set by whosoever
configures manager, which never comes from OVSDB.
Any configuration that goes into OVSDB in switch should ideally come from consumers of OVSDB, not plugin itself.
I can change this to netvirt bug, but want to make sure we are in agreement on nature of change coming in ie.e. a knob to set default inactivity for Controller
that gets created by autobridge.
Aha! I meant for Manager. This field is present in Manager and Controller both and I was specifically talking about Manager. For controller, we were also creating
controller manually, not using the autobridge code. I’d recommend creating an enhancement bug for this. We should add this knob and code to autobridge.
When you’re saying initial configuration do you mean before the OVS established an openflow connection?
Configuration for Controller Table in OVSDB is being set by ODL.
The CLI configuration for the inactivity probe looks like this for example:
sudo ovs-vsctl add Controller 8383a19f-4899-4808-ba0b-c970af081c3e inactivity_probe 10000
So it looks like this can only be set after the connection
We don’t have configuration for this today. We can add it or change netvirt code to add API. We changed it directly on OVS as part of initial configuration. We
were using scripts to configure OVSes with manager, so just added one more command. Another parameter you may want to look at is stats_interval which governs how frequently stat updates come, though we later on disabled stats by default.
Finally, if you’re using HA Proxy between OVS and ODL for manager connection or using single node i.e. OVS connects to only one manager at a time, you can tweak
a flag captured in this:
If you change this for a deployment where each OVS connects to each ODL node in cluster [multiple manager connections to cluster] changing this flag can have
functional impact, so be careful.
Can we set this value via OVSDB configuration file? Or do we need to change code to use some api for this?
Close, inactivity_probe. Sorry for delay, had to fish in old logs.
Do you recall where you tweak this configuration ? A quick google search didn’t help me.
Good catch Anil. I forgot that we *did* increased timeout to 30-60 seconds from default of 5. I say 30-60 because we did different testing to reduce no.
of echo messages going back and forth. But with 5 we used to see frequent disconnects, so yes, I’d agree with Anil that increasing timeouts should be a better solution. In fact default of 5 is terrible as you start scaling up, you’ll be processing too many
Full GC in a 8G-16G Heap takes about 10+ seconds
From: Anil Vishnoi [mailto:vishnoianil@...]
Sent: Tuesday, January 17, 2017 11:25 AM
To: Sela, Guy <guy.sela@...>
Cc: Vishal Thapar <vishal.thapar@...>; Muthukumaran K <muthukumaran.k@...>; Jamo Luhrsen <jluhrsen@...>;
Pearl, Tomer <tomer.pearl@...>;
Subject: Re: [openflowjava-dev] [ovsdb-dev] OVSDB scale
I think we should look at why OVS is getting disconnected during the GC? Is it because of the Echo timeout? Tuning GC will help, but i don't think so it will fix the root cause. I think if we can increase the echo timeouts,
probably disconnection won't happen atleast because of GC.
On Tue, Jan 17, 2017 at 1:15 AM, Sela, Guy <guy.sela@...> wrote:
So a couple of questions:
Did you reached Full GC? And if so, did the OVSs disconnected? And did everything continued working smoothly afterwards?
Do you have some script or mechanism you can share that will allow to quickly count number of flows in the data store?
Yeah, we had used similar heap and GC settings when testing ITM which added OVSDB and all the netvirt
models to the mix, I couldn’t recall what exactly were those.
We had focused mainly on baseline feature of OFPlugin so some of our changes specific to that drive-test
may not be applicable for Guy’s case. However, increasing the Heap and using G1GC is something he had already accounted.
For the scenario we were chasing (only openflowplugin + a load-driver app – bulk-o-matic), we had used
the settings mentioned in the last reply of this bug
There are few more tweaks in Openflowplugin – but they are all related to specifics of OFPlugin (Helium)
I believe we did have to do some tweaks with heapsize, GC settings etc.right? Do you recall?
Did you manage to survive Full GCs at all ?
If I don’t avoid it, a Full GC causes all OVSs to disconnect from the ODL, and it results
in a bit of chaos. Is there any way around this other than avoiding Full GC? I managed to avoid it in my testing using 16G heap size and G1 collector.
We tested not just OVSDB but OVSDB+Netvirt/VPNService at scale of about 80 OVS at the time with a full
mesh. Scale limits come more from size of datastore than anything else. So how many devices you can scale depends on extent of features you’re testing. Is it just OVSDB, or Netvit with multiple VMs per compute across multiple networks?
If you’re running into memory issues would be good to increase memory and capture memory usage. While
provisioning you may hit a high peak but will come down once it is done. I’ll check if I can get details of numbers we tested, should be lying somewhere in archived mails.
Sent: 17 January 2017 12:09
To: Jamo Luhrsen <jluhrsen@...>
Cc: Pearl, Tomer <tomer.pearl@...>;
ovsdb-dev@....org; Sela, Guy <guy.sela@...>; Vishal Thapar <vishal.thapar@...>
Subject: Re: [openflowjava-dev] [ovsdb-dev] OVSDB scale
I believe team from Ericsson also did some testing with it and we made some more performance improvement on boron.
@vishal : do you have any number from your ovsdb testing ?
On Tue, Jan 3, 2017 at 10:05 PM, Jamo Luhrsen <jluhrsen@...> wrote:
back in Beryllium there was a performance report released . You can see on page 31 that we
saw OVSDB scale up to 1800 nodes. There may be more recent tests done, and I think Marcus
may have some idea. But, I think your 200 number should be achievable.
On 01/02/2017 06:02 AM, Pearl, Tomer wrote:
> I’m trying to bring up a setup with one ODL controller and 200+ OVSs.
> I’m testing with Boron SR1 code
> Are there any reports about ODL scale tests that I can look at ?
> Is 200 OVSs an amount that supposed to work?
> Tomer P.
> ovsdb-dev mailing list
openflowjava-dev mailing list