toggle quoted messageShow quoted text
Fwding to Integration group.
Subject: [openflowplugin-dev] Fw: Statistics Manager performance
Date: March 31, 2014 at 9:00:25 AM PDT
Forwarding to the openflowplugin-dev list.
----- Forwarded by Abhijit Kumbhare/San Jose/IBM on 03/31/2014 08:59 AM -----
From: Anilkumar Vishnoi/India/IBM@IBMIN
To: Abhijit Kumbhare/San Jose/IBM@IBMUS, "Jan Medved (jmedved)" <jmedved@...>, "michal polkorab" <michal.polkorab@...>, "Prasanna Huddar" <prasanna.huddar@...>, "Robert Varga -X (rovarga - Pantheon Technologies SRO at Cisco)" <rovarga@...>, "Tony Tkacik -X (ttkacik - Pantheon Technologies SRO at Cisco)" <ttkacik@...>
Cc: "Anil Vishnoi" <vishnoianil@...>
Date: 03/26/2014 03:50 PM
Subject: Re: Statistics Manager performance
I did minor enhancement over the previous version of changes and pushed the code changes to controller repo through following gerrit
In the current version each node submit their individual statistics request to the request scheduler, rather then submitting the whole node. These changes avoid sending all statistics requests of the node in one chunk.
Overall results look better compare to the results i sent in previous mail. I did the test with 16, 32 and 64 nodes and i was able to see the topology and ping all was working fine as well. I did minimal functional testing to check that stats are getting updated properly.
Following are the CPU graph for 16,32 and 64 nodes tests
I also tried with 128 nodes as well, but controller started acting weird ( switch disconnection, exception from infispane event notify classes, CPU spikes). Overall from these 3 results, it looks like with every time nodes get double, CPU increases by 10% or so on average.
Robert can you please review the gerrit, and let me know if you have any suggestion.
Jan if you have time, can you please try this patch on your machine and see if it gives similar results.
Anilkumar Vishnoi---03/26/2014 01:39:55 AM---Anilkumar Vishnoi/India/IBM
I pushed first cut code changes to the repo for the statistics manager performance improvement. Robert, can you please review following draft gerrit
I did following changes in the code
1) Added StatisticsRequestSchedular class,
*.* that keep track of pending MD-SAL transaction requests using DataTransactionListener. It only keep tracks of request submitted by Statistics-Manager.
*.* implemented a queue, where each node submit request for scheduling
*.* whenever pending transaction requests goes to zero, it picks up the next node in the queue are execute that node for sending all the statistics request.
2) All the nodes periodically put themselves in the scheduler queue for execution, if node is already present in the queue, it won't submit accept duplicate request
3) Removal of stale statistics is now done based on the counters, rather then using timestamps, because timestamps can create issues in clustered environment. Removal of stale transaction IDs are still time based.
I did the performance test for both, existing implementation and new changes, on my laptop (i5 processor, 8gb ram, running - ubuntu mininet vm and few other processes as well). Following are the graphs for both the runs, I do see performance improvement but its not upto the expectation.
With existing implementation (16 node tree network):
With new code changes (16 node tree network):
I am working on following changes for further improvement of the code :
*.* As of now I am submitting "node" in the scheduler queue, and whenever scheduler pick any node, it sends all the statistics request for that node and I feel that's the reason behind these CPU peeks. I am planning to now submit individual stats request to the scheduler queue, so it will send individual statistics request, rather then sending all the request for that "node" in one shot.
*.* Modifying the interval dynamically, based on how many nodes are connected.
Please feel free to share your suggestions.
03/26/2014 01:39 AM
openflowplugin-dev mailing listopenflowplugin-dev@...