[openflowplugin-dev] Statistics Manager performance


Luis Gomez
 



FYI, they use YourKit to plot the CPU graphs, does anybody in Integration have experience with this tool?


Begin forwarded message:

From: Anil Vishnoi <vishnoianil@...>
Subject: Re: [openflowplugin-dev] Fw: Statistics Manager performance
Date: March 31, 2014 at 10:40:22 PM PDT
To: Luis Gomez <ecelgp@...>
Cc: Abhijit Kumbhare <abhijitk@...>

Hi Luis,

I am using YourKit Java Profile tool. I just took the manual snapshot of the CPU usage graph, i don't think it allows to export the graph for particular time range, if thats what you are looking for. 

But yes, it allows you to dump the profile report as snapshot. 

Thanks
Anil


On Tue, Apr 1, 2014 at 6:59 AM, Luis Gomez <ecelgp@...> wrote:
Hi Anil,

Quick question: how do you plot these CPU graphs? are you using the Hypervisor or any other tool?

Thanks/Luis



On Mar 31, 2014, at 9:00 AM, Abhijit Kumbhare <abhijitk@...> wrote:

Forwarding to the openflowplugin-dev list.


----- Forwarded by Abhijit Kumbhare/San Jose/IBM on 03/31/2014 08:59 AM -----

From: Anilkumar Vishnoi/India/IBM@IBMIN
To: Abhijit Kumbhare/San Jose/IBM@IBMUS, "Jan Medved (jmedved)" <jmedved@...>, "michal polkorab" <michal.polkorab@...>, "Prasanna Huddar" <prasanna.huddar@...>, "Robert Varga -X (rovarga - Pantheon Technologies SRO at Cisco)" <rovarga@...>, "Tony Tkacik -X (ttkacik - Pantheon Technologies SRO at Cisco)" <ttkacik@...>
Cc: "Anil Vishnoi" <vishnoianil@...>
Date: 03/26/2014 03:50 PM
Subject: Re: Statistics Manager performance




Hi All,

I did minor enhancement over the previous version of changes and pushed the code changes to controller repo through following gerrit

https://git.opendaylight.org/gerrit/#/c/5785/

In the current version each node submit their individual statistics request to the request scheduler, rather then submitting the whole node. These changes avoid sending all statistics requests of the node in one chunk.

Overall results look better compare to the results i sent in previous mail. I did the test with 16, 32 and 64 nodes and i was able to see the topology and ping all was working fine as well. I did minimal functional testing to check that stats are getting updated properly.

Following are the CPU graph for 16,32 and 64 nodes tests

16 nodes:
<23715775.jpg>

32 nodes:
<23953736.jpg>

64 nodes:
<23524216.jpg>


I also tried with 128 nodes as well, but controller started acting weird ( switch disconnection, exception from infispane event notify classes, CPU spikes). Overall from these 3 results, it looks like with every time nodes get double, CPU increases by 10% or so on average.

Robert can you please review the gerrit, and let me know if you have any suggestion.

Jan if you have time, can you please try this patch on your machine and see if it gives similar results.

Thanks
Anil


<graycol.gif>Anilkumar Vishnoi---03/26/2014 01:39:55 AM---Anilkumar Vishnoi/India/IBM

    Anilkumar Vishnoi/India/IBM 

    03/26/2014 01:39 AM

<ecblank.gif>
To
<ecblank.gif>

<ecblank.gif>
cc
<ecblank.gif>
<ecblank.gif>
Subject
<ecblank.gif>
    Statistics Manager performance
<ecblank.gif><ecblank.gif>
Hi Robert/Jan/Abhijit,

I pushed first cut code changes to the repo for the statistics manager performance improvement. Robert, can you please review following draft gerrit

https://git.opendaylight.org/gerrit/#/c/5753/

I did following changes in the code

1) Added StatisticsRequestSchedular class,
*.* that keep track of pending MD-SAL transaction requests using DataTransactionListener. It only keep tracks of request submitted by Statistics-Manager.
*.* implemented a queue, where each node submit request for scheduling
*.* whenever pending transaction requests goes to zero, it picks up the next node in the queue are execute that node for sending all the statistics request.

2) All the nodes periodically put themselves in the scheduler queue for execution, if node is already present in the queue, it won't submit accept duplicate request

3) Removal of stale statistics is now done based on the counters, rather then using timestamps, because timestamps can create issues in clustered environment. Removal of stale transaction IDs are still time based.

I did the performance test for both, existing implementation and new changes, on my laptop (i5 processor, 8gb ram, running - ubuntu mininet vm and few other processes as well). Following are the graphs for both the runs, I do see performance improvement but its not upto the expectation.

With existing implementation (16 node tree network):

<23196841.jpg>


With new code changes (16 node tree network):
<23894718.jpg>


I am working on following changes for further improvement of the code :

*.* As of now I am submitting "node" in the scheduler queue, and whenever scheduler pick any node, it sends all the statistics request for that node and I feel that's the reason behind these CPU peeks. I am planning to now submit individual stats request to the scheduler queue, so it will send individual statistics request, rather then sending all the request for that "node" in one shot.

*.* Modifying the interval dynamically, based on how many nodes are connected.

Please feel free to share your suggestions.

Thanks
Anil
_______________________________________________
openflowplugin-dev mailing list
openflowplugin-dev@...
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev




--
Thanks
Anil


Join {integration-dev@lists.opendaylight.org to automatically receive all group messages.