Magnesium CSIT check for SR3


Daniel de la Rosa
 

Hello TSC and all

Friendly reminder to help on this ASAP

Thanks


On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
Hello TSC and all

I have picked Magnesium AR 473 as RC so please help with CSIT check ASAP




Thanks



Srinivas Rachakonda
 

Hi Daniel,

Updated the excel sheet for Netvirt and Genius.

R.Srinivas
+91-9243478719




On Mon, 23 Nov 2020 at 11:51, Daniel de la Rosa <ddelarosa0707@...> wrote:
Hello TSC and all

Friendly reminder to help on this ASAP

Thanks

On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
Hello TSC and all

I have picked Magnesium AR 473 as RC so please help with CSIT check ASAP




Thanks






Luis Gomez
 

All the reds here:


are infra failure:

03:48:16 WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

which means there is not enough capacity to test an ODL distribution.

BR/Luis



On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Hello TSC and all

Friendly reminder to help on this ASAP

Thanks

On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
Hello TSC and all

I have picked Magnesium AR 473 as RC so please help with CSIT check ASAP




Thanks




Luis Gomez
 

BTW are we in a hurry to release Mg SR3 or can we fix the infra issues first?

Considering the distribution test launches a bunch of CSIT jobs in parallel, the fix for this is either:

- Increase the max number of cloud instances: not sure if there is a penalty in doing this, it should not if we just pay for cloud usage (not capacity).
- Implement a CSIT job execution queue: Instead of failing the CSIT job, the jobs could be queued until the cloud resources are available.

BR/Luis


On Nov 23, 2020, at 10:44 PM, Luis Gomez via lists.opendaylight.org <ecelgp=gmail.com@...> wrote:

All the reds here:


are infra failure:

03:48:16 WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

which means there is not enough capacity to test an ODL distribution.

BR/Luis



On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Hello TSC and all

Friendly reminder to help on this ASAP

Thanks

On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
Hello TSC and all

I have picked Magnesium AR 473 as RC so please help with CSIT check ASAP




Thanks








Daniel de la Rosa
 

I think it is better to fix the infra issues first but I'm gonna let @Robert Varga confirm..  So do we need to open an LFN IT ticket ?


On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...> wrote:
BTW are we in a hurry to release Mg SR3 or can we fix the infra issues first?

Considering the distribution test launches a bunch of CSIT jobs in parallel, the fix for this is either:

- Increase the max number of cloud instances: not sure if there is a penalty in doing this, it should not if we just pay for cloud usage (not capacity).
- Implement a CSIT job execution queue: Instead of failing the CSIT job, the jobs could be queued until the cloud resources are available.

BR/Luis


On Nov 23, 2020, at 10:44 PM, Luis Gomez via lists.opendaylight.org <ecelgp=gmail.com@...> wrote:

All the reds here:


are infra failure:

03:48:16 WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

which means there is not enough capacity to test an ODL distribution.

BR/Luis



On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Hello TSC and all

Friendly reminder to help on this ASAP

Thanks

On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
Hello TSC and all

I have picked Magnesium AR 473 as RC so please help with CSIT check ASAP




Thanks











Robert Varga
 

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?
No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

BTW are we in a hurry to release Mg SR3 or can we fix the infra
issues first?

Considering the distribution test launches a bunch of CSIT jobs in
parallel, the fix for this is either:

- Increase the max number of cloud instances: not sure if there is a
penalty in doing this, it should not if we just pay for cloud usage
(not capacity).
- Implement a CSIT job execution queue: Instead of failing the CSIT
job, the jobs could be queued until the cloud resources are available.

BR/Luis


On Nov 23, 2020, at 10:44 PM, Luis Gomez via
lists.opendaylight.org <http://lists.opendaylight.org>
<ecelgp=gmail.com@...
<mailto:ecelgp=gmail.com@...>> wrote:

All the reds here:

https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
<https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

are infra failure:
*
*
*03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

which means there is not enough capacity to test an ODL distribution.

BR/Luis



On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
<ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

Hello TSC and all

Friendly reminder to help on this ASAP

Thanks

On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
lists.opendaylight.org <http://lists.opendaylight.org/>
<ddelarosa0707=gmail.com@...
<mailto:gmail.com@...>> wrote:

Hello TSC and all

I have picked Magnesium AR 473 as RC so please help with CSIT
check ASAP

https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
<https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



Thanks






Luis Gomez
 

On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>


Daniel de la Rosa
 

Great, thanks! I’m guessing that without Ani, nobody from LFN IT is going to look into this... So @Anil Belur please let us know when you can take care of this, so we can release Magnesium SR3



On Tue, Nov 24, 2020 at 9:15 PM Luis Gomez <ecelgp@...> wrote:
On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>


Robert Varga
 

On 30/11/2020 07:57, Daniel de la Rosa wrote:
Great, thanks! I’m guessing that without Ani, nobody from LFN IT is
going to look into this... So @Anil Belur
<mailto:abelur@...> please let us know when you can take
care of this, so we can release Magnesium SR3
Well, they are supposed to have some redundancy.

Casey, there has been zero movement on the issue for 5 days -- can you
please escalate this?

Thanks,
Robert





On Tue, Nov 24, 2020 at 9:15 PM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

Here it is:

https://jira.linuxfoundation.org/plugins/servlet/theme/portal/2/IT-21093#
<https://jira.linuxfoundation.org/plugins/servlet/theme/portal/2/IT-21093#>

BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...
<mailto:nite@...>> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...
<mailto:nite@...>> confirm..  So do we need to open
an LFN IT ticket ?
No hurry, I guess -- all code is in, we now just need to be
confident it
is okay.

This looks like we want LF IT to take a look with some amount of
urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>
<mailto:ecelgp@... <mailto:ecelgp@...>>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if
there is a
   penalty in doing this, it should not if we just pay for cloud
usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the
CSIT
   job, the jobs could be queued until the cloud resources are
available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
<http://lists.opendaylight.org <http://lists.opendaylight.org>>
   <ecelgp=gmail.com@...
<mailto:ecelgp=gmail.com@...>
   <mailto:ecelgp=gmail.com@...
<mailto:ecelgp=gmail.com@...>>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
<https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
<https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource
CREATE failed: Forbidden:
resources.vm_1_group.resources[0].resources.instance: Quota
exceeded for cores: Requested 2, but already used 356 of 350
cores (HTTP 403) (Request-ID:
req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL
distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>
<mailto:ddelarosa0707@...
<mailto:ddelarosa0707@...>>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org>
<http://lists.opendaylight.org/ <http://lists.opendaylight.org/>>
   <ddelarosa0707=gmail.com@...
<mailto:ddelarosa0707=gmail.com@...>
   <mailto:gmail.com@...
<mailto:gmail.com@...>>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with
CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
<https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
<https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>>



       Thanks




   
<OpenPGP_0x537D744B0A1E3F45.asc>




Anil Belur
 

Greetings Daniel, Luis:

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

Regards,
Anil

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:
Here it is:


BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>





Luis Gomez
 

Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


On Dec 1, 2020, at 2:26 AM, Anil Belur <abelur@...> wrote:

Greetings Daniel, Luis:

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

Regards,
Anil

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:
Here it is:


BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>




Anil Belur
 


On Wed, Dec 2, 2020 at 3:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


Hello Luis: 

The max no of instances for the robot node type is set to 25, let's decrease the value to 20 going forward? 
This would imply that the dist-test would take a bit longer to complete.

Cheers. 

 


Anil Belur
 


On Thu, Dec 3, 2020 at 8:49 AM Anil Belur via lists.opendaylight.org <abelur=linuxfoundation.org@...> wrote:

On Wed, Dec 2, 2020 at 3:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


Hello Luis: 

The max no of instances for the robot node type is set to 25, let's decrease the value to 20 going forward? 
This would imply that the dist-test would take a bit longer to complete.

Cheers. 

 




Daniel de la Rosa
 

@Anil Belur i know you can't join next TSC at 9 am pst, so can you let us know if increasing the vexhost is going to be your long term solution? I'd like to be able to release Magnesium SR3 this week if possible 

Thanks

On Tue, Dec 1, 2020 at 9:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


On Dec 1, 2020, at 2:26 AM, Anil Belur <abelur@...> wrote:

Greetings Daniel, Luis:

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

Regards,
Anil

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:
Here it is:


BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>




Daniel de la Rosa
 

Hello Team

Any updates on the Magnesium SR3 issues?

Thanks

On Wed, Dec 2, 2020 at 8:01 PM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
@Anil Belur i know you can't join next TSC at 9 am pst, so can you let us know if increasing the vexhost is going to be your long term solution? I'd like to be able to release Magnesium SR3 this week if possible 

Thanks

On Tue, Dec 1, 2020 at 9:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


On Dec 1, 2020, at 2:26 AM, Anil Belur <abelur@...> wrote:

Greetings Daniel, Luis:

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

Regards,
Anil

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:
Here it is:


BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>







Luis Gomez
 

Infra is better now and missing BGPCEP patch is merged so I think we can pick next AR build as RC.

BR/Luis

On Dec 7, 2020, at 8:00 AM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Hello Team

Any updates on the Magnesium SR3 issues?

Thanks

On Wed, Dec 2, 2020 at 8:01 PM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
@Anil Belur i know you can't join next TSC at 9 am pst, so can you let us know if increasing the vexhost is going to be your long term solution? I'd like to be able to release Magnesium SR3 this week if possible 

Thanks

On Tue, Dec 1, 2020 at 9:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


On Dec 1, 2020, at 2:26 AM, Anil Belur <abelur@...> wrote:

Greetings Daniel, Luis:

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

Regards,
Anil

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:
Here it is:


BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>









Daniel de la Rosa
 

Thank you all. So I have picked Magnesium AR 491 and here is the updated tracksheet to review... 


Please review it ASAP so we can release Magnesium SR3 ASAP

Thanks





On Mon, Dec 7, 2020 at 1:04 PM Luis Gomez <ecelgp@...> wrote:
Infra is better now and missing BGPCEP patch is merged so I think we can pick next AR build as RC.

BR/Luis

On Dec 7, 2020, at 8:00 AM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Hello Team

Any updates on the Magnesium SR3 issues?

Thanks

On Wed, Dec 2, 2020 at 8:01 PM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
@Anil Belur i know you can't join next TSC at 9 am pst, so can you let us know if increasing the vexhost is going to be your long term solution? I'd like to be able to release Magnesium SR3 this week if possible 

Thanks

On Tue, Dec 1, 2020 at 9:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


On Dec 1, 2020, at 2:26 AM, Anil Belur <abelur@...> wrote:

Greetings Daniel, Luis:

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

Regards,
Anil

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:
Here it is:


BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>









Daniel de la Rosa
 

Thanks Luis for taking care of controller, distribution and lispflowing... @Srinivas Rachakonda or anybody from the genius and netvirt can help? 


On Tue, Dec 8, 2020 at 8:05 PM Daniel de la Rosa <ddelarosa0707@...> wrote:
Thank you all. So I have picked Magnesium AR 491 and here is the updated tracksheet to review... 


Please review it ASAP so we can release Magnesium SR3 ASAP

Thanks





On Mon, Dec 7, 2020 at 1:04 PM Luis Gomez <ecelgp@...> wrote:
Infra is better now and missing BGPCEP patch is merged so I think we can pick next AR build as RC.

BR/Luis

On Dec 7, 2020, at 8:00 AM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Hello Team

Any updates on the Magnesium SR3 issues?

Thanks

On Wed, Dec 2, 2020 at 8:01 PM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
@Anil Belur i know you can't join next TSC at 9 am pst, so can you let us know if increasing the vexhost is going to be your long term solution? I'd like to be able to release Magnesium SR3 this week if possible 

Thanks

On Tue, Dec 1, 2020 at 9:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


On Dec 1, 2020, at 2:26 AM, Anil Belur <abelur@...> wrote:

Greetings Daniel, Luis:

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

Regards,
Anil

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:
Here it is:


BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>









Luis Gomez
 

Sorry but I cannot take credit for lispflowing, Lori did :)


On Dec 9, 2020, at 2:09 PM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Thanks Luis for taking care of controller, distribution and lispflowing... @Srinivas Rachakonda or anybody from the genius and netvirt can help? 

On Tue, Dec 8, 2020 at 8:05 PM Daniel de la Rosa <ddelarosa0707@...> wrote:
Thank you all. So I have picked Magnesium AR 491 and here is the updated tracksheet to review... 


Please review it ASAP so we can release Magnesium SR3 ASAP

Thanks





On Mon, Dec 7, 2020 at 1:04 PM Luis Gomez <ecelgp@...> wrote:
Infra is better now and missing BGPCEP patch is merged so I think we can pick next AR build as RC.

BR/Luis

On Dec 7, 2020, at 8:00 AM, Daniel de la Rosa <ddelarosa0707@...> wrote:

Hello Team

Any updates on the Magnesium SR3 issues?

Thanks

On Wed, Dec 2, 2020 at 8:01 PM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:
@Anil Belur i know you can't join next TSC at 9 am pst, so can you let us know if increasing the vexhost is going to be your long term solution? I'd like to be able to release Magnesium SR3 this week if possible 

Thanks

On Tue, Dec 1, 2020 at 9:41 AM Luis Gomez <ecelgp@...> wrote:
Hi Anil,

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

BR/Luis


On Dec 1, 2020, at 2:26 AM, Anil Belur <abelur@...> wrote:

Greetings Daniel, Luis:

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

Regards,
Anil

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:
Here it is:


BR/Luis


On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

On 24/11/2020 18:29, Daniel de la Rosa wrote:
I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?

No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert




On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis


   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis



   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks








   

<OpenPGP_0x537D744B0A1E3F45.asc>










Srinivas <srinivas.rachakonda@...>
 

Hi Daniel,

 

Updated the spreadsheet for Magnesium SR3.

 

Thanks,

Srinivas

+91-9243478719

 

From: Daniel de la Rosa <ddelarosa0707@...>
Sent: 10 December 2020 03:39
To: Luis Gomez <ecelgp@...>; Robert Varga <nite@...>; Srinivas Rachakonda <srinivas.rachakonda@...>
Cc: Anil Belur <abelur@...>; Release <release@...>; TSC <tsc@...>
Subject: Re: [OpenDaylight TSC] Magnesium CSIT check for SR3

 

Thanks Luis for taking care of controller, distribution and lispflowing... @Srinivas Rachakonda or anybody from the genius and netvirt can help? 

 

On Tue, Dec 8, 2020 at 8:05 PM Daniel de la Rosa <ddelarosa0707@...> wrote:

Thank you all. So I have picked Magnesium AR 491 and here is the updated tracksheet to review... 

 

 

Please review it ASAP so we can release Magnesium SR3 ASAP

 

Thanks

 

 

 

 

 

On Mon, Dec 7, 2020 at 1:04 PM Luis Gomez <ecelgp@...> wrote:

Infra is better now and missing BGPCEP patch is merged so I think we can pick next AR build as RC.

 

BR/Luis



On Dec 7, 2020, at 8:00 AM, Daniel de la Rosa <ddelarosa0707@...> wrote:

 

Hello Team

 

Any updates on the Magnesium SR3 issues?

 

Thanks

 

On Wed, Dec 2, 2020 at 8:01 PM Daniel de la Rosa via lists.opendaylight.org <ddelarosa0707=gmail.com@...> wrote:

@Anil Belur i know you can't join next TSC at 9 am pst, so can you let us know if increasing the vexhost is going to be your long term solution? I'd like to be able to release Magnesium SR3 this week if possible 

 

Thanks

 

On Tue, Dec 1, 2020 at 9:41 AM Luis Gomez <ecelgp@...> wrote:

Hi Anil,

 

I think increasing the vexhost quota might not be the right solution here because: 1) it may impact the vexhost bill (this is why you added a quota in first place) and 2) it is hard to adjust the quota so that no CSIT fails.

 

If you see my last comment in the ticket, I remember when we hit this issue in the past, Thanh adjusted the maximum number of robot minions that are allowed to run in parallel and by doing so the CSIT jobs where queuing in Jenkins without failing. If we do this again, we can avoid impact on vexhost bill as well as we can internally adjust the maximum number of robot minions without involving vexhost.

 

BR/Luis

 

 

On Dec 1, 2020, at 2:26 AM, Anil Belur <abelur@...> wrote:

 

Greetings Daniel, Luis:

 

I've raised a request with vexhost to increase the quota limits. The primary reason the quotas were put in place to make sure we don't see a spike in the invoice as seen a few months ago. 

 

Do we know why we are exceeding the limits recently since the quotas have been in place for a while? 

 

Regards,

Anil

 

On Wed, Nov 25, 2020 at 3:15 PM Luis Gomez <ecelgp@...> wrote:

Here it is:

 

 

BR/Luis

 



On Nov 24, 2020, at 3:53 PM, Robert Varga <nite@...> wrote:

 

On 24/11/2020 18:29, Daniel de la Rosa wrote:

I think it is better to fix the infra issues first but I'm gonna
let @Robert Varga <mailto:nite@...> confirm..  So do we need to open
an LFN IT ticket ?


No hurry, I guess -- all code is in, we now just need to be confident it
is okay.

This looks like we want LF IT to take a look with some amount of urgency
-- so can we file a ticket, please?

Thanks,
Robert





On Tue, Nov 24, 2020 at 9:05 AM Luis Gomez <ecelgp@...
<mailto:ecelgp@...>> wrote:

   BTW are we in a hurry to release Mg SR3 or can we fix the infra
   issues first?

   Considering the distribution test launches a bunch of CSIT jobs in
   parallel, the fix for this is either:

   - Increase the max number of cloud instances: not sure if there is a
   penalty in doing this, it should not if we just pay for cloud usage
   (not capacity).
   - Implement a CSIT job execution queue: Instead of failing the CSIT
   job, the jobs could be queued until the cloud resources are available.

   BR/Luis



   On Nov 23, 2020, at 10:44 PM, Luis Gomez via
   lists.opendaylight.org <http://lists.opendaylight.org>
   <ecelgp=gmail.com@...
   <mailto:ecelgp=gmail.com@...>> wrote:

   All the reds here:

   https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/
   <https://jenkins.opendaylight.org/releng/job/integration-distribution-test-magnesium/451/>

   are infra failure:
   *
   *
   *03:48:16* WARN: Failed to initialize stack. Reason: Resource CREATE failed: Forbidden: resources.vm_1_group.resources[0].resources.instance: Quota exceeded for cores: Requested 2, but already used 356 of 350 cores (HTTP 403) (Request-ID: req-cbc3d8c8-b59b-4430-aed7-ab664be171d8)

   which means there is not enough capacity to test an ODL distribution.

   BR/Luis




   On Nov 22, 2020, at 10:21 PM, Daniel de la Rosa
   <ddelarosa0707@... <mailto:ddelarosa0707@...>> wrote:

   Hello TSC and all

   Friendly reminder to help on this ASAP

   Thanks

   On Fri, Nov 20, 2020 at 9:35 AM Daniel de la Rosa via
   lists.opendaylight.org <http://lists.opendaylight.org/>
   <ddelarosa0707=gmail.com@...
   <mailto:gmail.com@...>> wrote:

       Hello TSC and all

       I have picked Magnesium AR 473 as RC so please help with CSIT
       check ASAP

       https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing
       <https://docs.google.com/spreadsheets/d/1WgBX2c5DjQRfphM0sc3IKt7igW3EMxnUgANTcOlYD_U/edit?usp=sharing>



       Thanks






   

<OpenPGP_0x537D744B0A1E3F45.asc>