toggle quoted message
Show quoted text
I forgot to mention OpenFlow is our first target for performance but we will be addressing other protocols in future.
On May 28, 2014, at 9:43 AM, Luis Gomez <ecelgp@...
Hi Prasanna and all,
The Integration team will be measuring performance and so far we have some ideas that involve mininet and CBench:
What is important to understand is the kind of performance test we do in Integration is at system level and so it will involve ofplugin, oflibrary and controller MD-SAL components. Now, if a project wants do to its own performance test, only involving its components, I think it is very free and I would also encourage to do it.
Can we have Integration team do the bench mark performance tests on hydrogen release.
Since some enhancements have already gone in MD-SAL, plugin and library,
we will right away see the difference between Hydrogen release and latest mainline code performance.
And as and when we do any enhancement, Integration team can re-run the performance test cases and document the numbers.
This way we can track the performance improvements in ODL controller in central place instead of everyone doing in their environment.
For the codec debuggability, I submitted a patch to dump "shadow" codec class source to files in a "generated-sources" dir. You can't step into the source with the eclipse debugger but at least you can see the source code. For full debuggability, ie step into the source code and see line numbers in stack traces, I think the codec classes would have to actually be compiled from source via the java compiler instead of using javassist, ie generate the full source at run time and invoke javac. That's what tomcat does with jsp's I believe.
On Thu, May 22, 2014 at 9:03 PM, Jan Medved (jmedved) <jmedved@...> wrote:
Thanks a lot for the great comments & feedback . Please see more inline.
Thank you for initiating this thread. We, at Ericsson have been using the MD-SAL infrastructure and have noticed the following (good and bad) about MD-SAL.
First the positives:
1. Easy and uniform interface to the controller configuration and operational state through the data store
2. Largely technology independent interface made available to applications.
3. Versioning encoded in the YANG models makes upgrades easier.
4. Architecture provides good foundation to build advanced services like clustering etc.
Some of the problems have already been identified by the community.
1. Performance: This is a key issue with MD-SAL which, I am afraid, makes ODL somewhat inferior to controllers like Beacon, Floodlight etc.
Agreed that performance is key; We are nowhere near close where want to be - and where we can be. Performance is actively being worked on since Hydrogen - a lot of things have been improved since Hydrogen, more are in the works. For example, there was the first phase of the new Data Store implementation (it has to be done in phases due to the size and the the need to do intermediate performance benchmarking & tuning). Other things are in the works too - for example, tuning netty and the base io. There will be a few performance improvement patches going in during the next couple of weeks I am planning to have a session on a TWS and/or MD-SAL call a couple of weeks from now about the lessons we learned from the performance tuning exercise - I’d like to do it once the merges are in and we have the “before" and “after” numbers.
a. With the Hashtable Datastore implementation in Hydrogen, flow setup rate deteriorated exponentially with the number of flows and with 100 flows in the data store, we have observed flow setup rate come to a crawling 1 flow/second.
The HashTable data store was clearly not the right implementation. I
n any case, what was the environment that you used for this benchmark? What hardware/VM/OS? What was the performance test that you used, and in which setting? I have been using cbench on my Mac natively., but have not been driving the controller from the NBI.
The new TreeDataStore is precisely aimed to solve most of the performance issues with the Hashtable DataStore. However, I believe this approach also exhibits slower performance as the number of flows in the DataStore increases.
Yes. This is a known problem, and is being worked on. Robert will have a patch in a couple of days, and more work is planned. We will share the ideas in a separate email.
On another note, did you measure the throughput of the new data store in the same environment where you did the measurements for Hydrogen? If possible, could you please share the results?
b. For deployments based on OpenFlow, the flow setup rate will be a key metric. Similarly, latency i.e. time taken for a request to be fulfilled from the time it is received in the controller will be critical. Due to the existing threading model for requests from NBI, the latency is significantly higher than a controller like Beacon.
I haven’t done the latency measurements from the NBI to the OF Plugin - would it be possible to share the test setup and/or the tests? For cbench latency testing, though, the numbers for ODL in my environment are in the same ballpark as Floodlight. I could not get Beacon run reliably in my environment, so I don’t have the results for Beacon.
c. 4 Phase commit: The requirement for ODL to support SBI like Netconf which is Binding Independent is well understood. The current design of introducing 2 additional phases in the commit to convert the Binding Aware to Binding Independent makes debugging difficult and also introduces some performance overhead (~5-8%).
The commit phases are the same for binding-independent and binding aware data. There is a translation going on from the binding aware format (Java DTOs) to the binding-independent DOM format and vice versa, but these are not commit phases. Now, any app that wants to use Java POJOs (which the DTOs are) would have to to this translation. You can avoid it if your app uses the DOM data format used by the data store.
On another note, did you do the measurements on a recent master? My profiler results do not show this kind of utilization for the translations - would it be possible to get together and compare the methodologies & results?
Can we explore different code paths for Binding Aware and Binding Independent services, rather than perform this conversion to unify them into a single code path?
But there are advantages to use the Java DTO approach. If we don’t use it, each app will have to do their own translation (with varying results). It may be better to identify the bottlenecks in the current approach and make performance improvements that everybody will benefit from. As I said above, can we get together and compare results, and run benchmarking on a recent master?
d. Just to provide an example of the performance issues we have observed with Hydrogen, the hashtable based data store stabilizes at 1 flow/sec after a few flows in the data store, while the same test that utilizes the SalFlowService RPC from a NSF bundle i.e. bypassing the data store results in > 20K flows/sec. These measurements were made on ODL Hydrogen release.
The performance for the Data Store is vastly improved in a recent master. For RPCs, it’s about the same as it was for Hydrogen, but we have identified a few bottlenecks which are being fixed - the merges should be coming soon.
2. Debug ability: Run time generated code makes debug ability fairly complicated. Can this be avoided?
For static systems, where all models are known upfront, it can be done. The yang tools can be run at run time or at compile time (as a matter of fact, some of them, like Java API generation is only run at compile time; only codec generation is run at run time). But, the support for runtime code generation is required for run time loading of plugins and runtime discovery of models in network devices.
Maybe the problem is not so much runtime/compile time generation, but the overall debuggability of the generated code. Maybe we should improve the tools to make debugging easier (logs? Generation of “shadow” source code that can be stepped through in an IDE? ...)
3. Clustering Services: Hydrogen release was missing MD-SAL clustering, but this is getting good attention in Helium & beyond with initiatives like the MD-SAL clustering using Akka.
4. Controller Readiness status: Since YANG models are loaded in a lazy manner, it is almost impossible to say when all the YANG models have converged & the controller is ready to accept all request (from NBI and SBI). This problem is solved for RESTConf by making it register for YANG convergence events and have it reload its cache based on these events. However, from the controller, it is impossible to say when the controller is really ready.
This is not a problem of loading yang models, but a problem of indeterministic loading/activation of OSGI modules. The best way to fix this may be at the container level (some work going on in Karaf already, and there is a also proposed solution in the configuration subsystem).
These are some points that I could think of. I hope we can address some of these for Helium.
Definitely - when would be a good time to get together about the performance issues?
I would like to continue to capture the momentum we have gathered thus far on bringing the community around MD-SAL. One thing we have discussed a few times is identifier where the gaps in features, funcationality, and usability are and start looking at how we can get them addressed. So, on that note, here is the question/request I pose to all of you:
(?)What is preventing you from using MD-SAL? What works well? What could be improved?
From our original meeting I recall the follow items that were discussed (plus I added some of my own here) :
1. Testability – it is difficult to test (Junit test) code that depends on MD-SAL.
[Proposal] Work on designing, implementing, and explaining with examples, and meaningful and useful suite of testing helpers.
2. Usability – There are a large number of classes generated with the same name, and only differ by the package name. This makes it really difficult to find and use the right version of the class.
[Proposal] Make generate file names unique by adding logical prefixes / suffixes into their name. For example BindingAwareInstanceIdentifier and BindingIndependentInstanceIdentifier (instead of just InstanceIndentifier).
Please share your thoughts via e-mail (think of this as a brain storming session – there is no wrong answers :) ) and if you are really inspired, please open a bug to track the request and facilitate more discussion (ultimately everything will be come bugs).
controller-dev mailing list
Discuss mailing listDiscuss@...https://lists.opendaylight.org/mailman/listinfo/discuss