[E] Re: [integration-dev] integration/distribution version issues


Sangwook Ha
 

Looking at the release & version bump cycle, some of the steps may be simplified, and hopefully automated:

- For each release of managed projects, release & version bump, are done separately a few days apart - is there any reason why this cannot be done together?
- For 'opendaylight/pom.xml' there are multiple manual steps: can steps 2 & 3 be merged & done automatically?
1) activate profiles for self-managed projects
2) release
3) update versions and deactivate profiles for self-managed projects (sometimes the version bump is done in two separate steps: the artifact/karaf & self-managed project)

And there are two different types of release tags - one for managed projects (e.g. release/silicon-sr3) and common release (14.3.0). The former does not update 'opendaylight/pom.xml' but the latter updates all the POM files. And this is confusing because all the versions except for 'opendaylight/pom.xml' have been bumped up by the time the release is made, and they have a version ahead of what the label says (e.g. for the tag '14.3.0' all the versions except for 'opendaylight/pom.xml' is '14.4.0').

Thanks,
Sangwook

On Wed, Nov 17, 2021 at 1:18 AM Robert Varga <nite@...> wrote:
On 17/11/2021 08:26, Luis Gomez wrote:
> I thought this was clear, at least to ODL old folks, the int/dist
> project holds 2 distributions:

Three, actually.

> - Karaf distribution (karaf/pom.xml) only containing Managed projects is
> also a Managed project and integrated with autorelease (automatic
> release & bump).
> - Common distribution (opendaylight/pom.xml) containing Managed and Self
> Managed projects. This is a Self Managed project and therefore it has to
> be manually released, bumped, etc, just like any other SM project.

Yes, and therefore the release lifecycle of int/dist is unlike any other
project I have come across.

> AFAIR the sanity test you are pointing out is the only test that uses
> the common distribution, all of our CSIT uses Karaf distribution. I hope
> this explains.

Right-o, but unfortunately you are explaining something completely
off-topic, so let me try to reiterate.

1. Ever since the dawn of autorelease MSI projects have agreed to bump
the minor version, i.e. 1.2.0 -> 1.3.0

2. On Feb 22 this year, Anil branched stable/silicon, correctly bumping
opendaylight/pom.xml from 0.14.0-SNAPSHOT to 0.15.0-SNAPSHOT:
https://git.opendaylight.org/gerrit/c/integration/distribution/+/95287

3. On Apr 3 this year, you changed the versioning scheme on
stable/silicon to 14.0.0-SNAPSHOT:
https://git.opendaylight.org/gerrit/c/integration/distribution/+/95655

4. On Apr 22 this year, Guillaume made a similar change on then-master:
https://git.opendaylight.org/gerrit/c/integration/distribution/+/95789

5. On Sep 21 this year, Anil branched stable/phosphorus, but unlike all
the previous times, opendaylight/pom.xml's version was NOT updated:
https://git.opendaylight.org/gerrit/c/integration/distribution/+/97551

6. On Sep 24 this year, the first successful
distribution-merge-full-sulfur job run:
https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/distribution-merge-full-sulfur/2/console.log.gz,
happily doing:

> Deploying the main artifact opendaylight-15.0.0-SNAPSHOT.tar.gz
> Uploading to opendaylight-snapshot: https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/opendaylight/15.0.0-SNAPSHOT/opendaylight-15.0.0-20210924.032546-1083.tar.gz
> Uploaded to opendaylight-snapshot: https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/opendaylight/15.0.0-SNAPSHOT/opendaylight-15.0.0-20210924.032546-1083.tar.gz (266 MB at 28 MB/s)
> Uploading to opendaylight-snapshot: https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/opendaylight/15.0.0-SNAPSHOT/maven-metadata.xml
> Uploaded to opendaylight-snapshot: https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/opendaylight/15.0.0-SNAPSHOT/maven-metadata.xml (982 B at 18 kB/s)

7. On Sep 24 this year, run-of-the mill
distribution-merge-full-phosphorus ran:
https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/distribution-merge-full-phosphorus/1170/console.log.gz,
happily doing this:

> Deploying the main artifact opendaylight-15.0.0-SNAPSHOT.tar.gz
> Uploading to opendaylight-snapshot: https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/opendaylight/15.0.0-SNAPSHOT/opendaylight-15.0.0-20210924.050736-1084.tar.gz
> Uploaded to opendaylight-snapshot: https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/opendaylight/15.0.0-SNAPSHOT/opendaylight-15.0.0-20210924.050736-1084.tar.gz (266 MB at 28 MB/s)
> Uploading to opendaylight-snapshot: https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/opendaylight/15.0.0-SNAPSHOT/maven-metadata.xml
> Uploaded to opendaylight-snapshot: https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/org/opendaylight/integration/opendaylight/15.0.0-SNAPSHOT/maven-metadata.xml (982 B at 20 kB/s)

8. This continued for quite some time, i.e. the contents of
opendaylight-15.0.0-SNAPSHOT flip-flopped between Sulfur and Phosphorus

9. On Oct 24 distribution-merge-full-phosphorus started publishing
opendaylight-15.1.0-SNAPSHOT:
https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/distribution-merge-full-phosphorus/1322/console.log.gz

10. On Oct 24 distribution-merge-full-sulfur started failing:
https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/distribution-merge-full-sulfur/149/console.log.gz

11. On Nov 13 the last published opendaylight-15.0.0-SNAPSHOT expired in
Nexus

12. On Nov 14 distribution-merge-full-sulfur started failing:
https://s3-logs.opendaylight.org/logs/releng/vex-yul-odl-jenkins-1/distribution-merge-full-sulfur/275/console.log.gz
with

> ERROR: Failed to parse POMs
> hudson.remoting.ProxyException: hudson.maven.MavenModuleSetBuild$MavenExecutionException: org.apache.maven.project.ProjectBuildingException: Some problems were encountered while processing the POMs:
> [FATAL] Non-resolvable parent POM for org.opendaylight.integration:opendaylight:15.0.0-SNAPSHOT: Could not find artifact org.opendaylight.integration:karaf:pom:0.15.0-SNAPSHOT in opendaylight-snapshot (https://nexus.opendaylight.org/content/repositories/opendaylight.snapshot/) and 'parent.relativePath' points at no local POM @ line 14, column 13


The bottom line is:

- int/dist setup was busted for almost two months
- job failures have been indicating this clearly for a month
- now finally everything fell apart
- it was a downstream user who detected this mess

I equally hope this explains.

Bye,
Robert

>
> For more information check this doc:
> https://docs.opendaylight.org/projects/integration-distribution/en/latest/add-project-distribution.html
> <https://docs.opendaylight.org/projects/integration-distribution/en/latest/add-project-distribution.html>
>
> BR/Luis
>
>> On Nov 16, 2021, at 9:08 PM, Robert Varga <nite@...
>> <mailto:nite@...>> wrote:
>>
>> On 17/11/2021 00:37, Daniel de la Rosa wrote:
>>> Let me add more Robert and team to make sure that they see this email
>>
>> This boils down to interaction between autorelease and int/dist.
>>
>> autorelease assumes branch cutting involves bumping minor version,
>> which has been true for all MSI projects since forever.
>>
>> int/dist started violating that assumption by changing versioning
>> scheme here:
>> https://git.opendaylight.org/gerrit/c/integration/distribution/+/95655
>> <https://git.opendaylight.org/gerrit/c/integration/distribution/+/95655>
>>
>> I have no skin in this particular game, sorry.
>>
>> Regards,
>> Robert
>>
>>
>>> On Tue, Nov 16, 2021 at 1:47 PM Sangwook Ha via
>>> lists.opendaylight.org <http://lists.opendaylight.org>
>>> <http://lists.opendaylight.org <http://lists.opendaylight.org>>
>>> <sangwook.ha=verizon.com@...
>>> <mailto:sangwook.ha=verizon.com@...>
>>> <mailto:verizon.com@...
>>> <mailto:verizon.com@...>>> wrote:
>>>    It appears that the versions in
>>>    integration/distribution/opendaylight have not been updated, and
>>>    some Jenkins jobs are failing: e.g.
>>>    openflowplugin-csit-1node-sanity-only-sulfur
>>>    <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-sanity-only-sulfur/
>>> <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-sanity-only-sulfur/>>
>>>    I submitted two patches to fix up the version issues for Sulfur &
>>>    Silicon - Phosphorus seems okay.
>>>    Shouldn't this be done automatically when it's released? Looks like
>>>    release version bump to remove SNAPSHOT is done automatically but
>>>    SNAPSHOT version is not updated for opendaylight artifact.
>>>    Thanks,
>>>    Sangwook
>>>
>


Robert Varga
 

On 17/11/2021 17:42, Ha, Sangwook wrote:
Looking at the release & version bump cycle, some of the steps may be simplified, and hopefully automated:
- For each release of managed projects, release & version bump, are done separately a few days apart - is there any reason why this cannot be done together?
Yes there is and there is quite a bit of history attached to that.

Our governance clearly states that each project is independent, which in this context means is free to release whenever as well as can decide to release outside of Simultaneous Release.

During our initial (Hydrogen, 2014) release, we have had all projects integrating on SNAPSHOTs and the release party was ... not awesome. If you see git tags like "jenkins-controller-bulk-release-prepare-only-2-1", those are from that period. There were multiple technical reasons why it did not go so well.

As a reaction to that, autorelease was created, to have all projects still integrated on SNAPSHOTs, but projects gave up their right to release on their own and instead all projects were release by LFN personnel in one large chunk -- we are still doing this for MSI projects.

The experience of being integrated on SNAPSHOTs was ... not exactly great -- it lead to the creation of validate-autorelease and distribution-check jobs, which gate every patch so that it does not happen to break downstreams.

We have mostly addressed the technical issues by 2017 and started peeling projects away from autorelease with odlparent-2.0.0 (IIRC). Fast forward to today and autorelease does not carry a single kernel project.

With that context, I will never agree to going for "release together" --
that amounts to autorelease hell. I have endured it for years upon years and have toiled sweat and blood to get out of it. I am *NEVER* going back to that, period.


Can we do better than we are doing today?

Certainly.

There are multiple reasons why kernel project releases happen "a few days apart". It is mostly a manual process and you can probably guess whose time is being spent on it.

There is silver lining, though, as Nexus promotions take ~30 minutes for most projects, not 2+ hours like they used to just two months ago. Now if we only had reliable automation taking advantage of that...

Regards,
Robert