Re: [release] [OpenDaylight TSC] Magnesium CSIT check for SR3

Thanh ha <zxiiro@...>

On Wed, Dec 2, 2020 at 5:48 PM Anil Belur <abelur@...> wrote:
On Wed, Dec 2, 2020 at 3:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.


Hello Luis: 

The max no of instances for the robot node type is set to 25, let's decrease the value to 20 going forward? 
This would imply that the dist-test would take a bit longer to complete.

I can see there's some orphaned servers that don't appear to be getting deleted, from the Jenkins UI:

Also if we look at the OpenStack Cron job that's supposed to be cleaning up orphaned systems ( there appears to be 3 systems that keep appearing in the logs if we look at a few different jenkins runs:

19:34:34 Deleting orphaned server: prd-centos7-builder-4c-16g-53684
19:34:44 Deleting orphaned server: prd-centos7-robot-2c-8g-53683
19:34:53 Deleting orphaned server: prd-centos7-builder-4c-16g-53675

We can see this log going back as far as build #31902 ( which looks like Nov 26th.

Looks like these have been failing to delete for some time now but the openstack-cron job is not reporting the failure to delete. Makes me wonder if there are other resources not seen here that need to be cleaned up in Vexxhost.

Hope this helps,

Join { to automatically receive all group messages.