[OpenDaylight Discuss] Stable branches etc.


Robert Varga
 

On 30/01/2019 15:30, Sam Hague wrote:


On Wed, Jan 30, 2019 at 5:03 AM <guillaume.lambert@...
<mailto:guillaume.lambert@...>> wrote:

Hi Stephen

I am sharing your feedback and thinks it would make a lot of sense.
Many linux distros use something similar to deal with staging
packages in their repo, e.g. Fedora with stable/branched/rawhide
repos or Debian with stable/testing/unstable repos.
With only one (master) branch, it is difficult for downstream
projects to deal with both the new features to develop and needed
migrations for the next release at the same time.
An intermediate branch may allow a better synchronization with the
upstream projects as long as ongoing evolution are made available
through nexus.

This is a good point and was a similar reason for needing a branch. This
has hit us every release where the stable branch is pulled and master
goes forward, but downstreams still want to continue working. They can't
since the stable branch is locked and master becomes the wild west.
We really need to address the 'wild west' aspect here -- and we need
concrete examples from past two releases.

Based on the release schedule, the next release is not open, which
certainly is not a wild card to wreak havoc on downstreams -- so who and
why is causing it?!

Some
of this could be alleviated with more reliable planning - getting code
in earlier and tested - but that is hard with limited resources. An
intermediate branch would provide a place to keep working to finish
things and make it into the sr1 and not try to cram something in at the
last minute on the stable branch.
Well, I take the position that limited resources dictate limited code
churn and more incremental feature delivery. This includes the hard task
of culling deliverables early.

The additional branch really works in exactly the opposite direction:
rather than the features being postponed to the next GA release, they
are pushed out to SR1 (and SR2, etc.). That breaks the strong reading of
the SimRel schedule, really.

What I mean is that in the past the GA release was postponed by up to
three months to cope with "things just happening", with the hope being
that such events would become ever rarer. That has not generally
happened and today we have SimRel which is not time-flexible.

That time-flexibility meant that feature delivery problems would get
masked (i.e. you'd get 8-12 weeks more time in a particular cycle).

With that flexibility gone, though, there are only two options I can see:

1) the SimRel schedule works for a particular project
2) the SimRel schedule does not work for a particular project

I think we are dealing with a case of 2) here, and the question is whether:

a) SimRel schedule needs to be fixed
b) the project's upstreams need to be fixed
c) the project needs to be fixed

One final note: unlike MRI and self-managed projects, projects in
autorelease build have no control over how/when they consume upstream
changes nor when they release. I believe this is a major component of
the pain here.

Regards,
Robert


Sam Hague <shague@...>
 



On Mon, Feb 11, 2019 at 8:19 AM Robert Varga <nite@...> wrote:
On 30/01/2019 15:30, Sam Hague wrote:
>
>
> On Wed, Jan 30, 2019 at 5:03 AM <guillaume.lambert@...
> <mailto:guillaume.lambert@...>> wrote:
>
>     Hi Stephen
>
>     I am sharing your feedback and thinks it would make a lot of sense.
>     Many linux distros use something similar to deal with staging
>     packages in their repo, e.g. Fedora with stable/branched/rawhide
>     repos or Debian with stable/testing/unstable repos.
>     With only one (master) branch, it is difficult for downstream
>     projects to deal with both the new features to develop and needed
>     migrations for the next release at the same time.
>     An intermediate branch may allow a better synchronization with the
>     upstream projects as long as ongoing evolution are made available
>     through nexus.
>
> This is a good point and was a similar reason for needing a branch. This
> has hit us every release where the stable branch is pulled and master
> goes forward, but downstreams still want to continue working. They can't
> since the stable branch is locked and master becomes the wild west.

We really need to address the 'wild west' aspect here -- and we need
concrete examples from past two releases.
Possible example, the sodium branch is broken for NetVirt because of the karaf.shell missing. [1] was pushed to add the pom dependency. Possible it was something else that caused the issue, but the point is that when things go unstable and not fixed you get days or weeks of things leaking in.

Based on the release schedule, the next release is not open, which
certainly is not a wild card to wreak havoc on downstreams -- so who and
why is causing it?!
The projects all work independent of each other with no protection so it is very easy for one project to break another. This typically happens right at this time when the stable branch is cut and then master is opened up. Projects start merging stuff. Not too bad when you are upstream but downstream you catch all the issues.

True, you could mitigate this by effectively stopping master development and focusing on the stable branch. The schedule doesn't allow this since it puts pressure on the downstreams to catch up later. Maybe a change would help here. The downstreams are the mercy of the upstreams though. I think better protection/verification and working together would be more effective, since today all that happens is pointing fingers at who caused a breakage.

> Some
> of this could be alleviated with more reliable planning - getting code
> in earlier and tested - but that is hard with limited resources. An
> intermediate branch would provide a place to keep working to finish
> things and make it into the sr1 and not try to cram something in at the
> last minute on the stable branch.

Well, I take the position that limited resources dictate limited code
churn and more incremental feature delivery. This includes the hard task
of culling deliverables early.
Agreed, we need to do a much better job of this. This is very hard when resources are not consistent so the planning is worthless. Nothing in the schedule accounts for corrections though, beyond dropping the feature. I think in times past this was something we tried to avoid since there were further downstreams wanting to consume the features and we cared.

I think neon will make it out on time. We pushed many features out to sodium. So the idea of culling does help considerably.

The additional branch really works in exactly the opposite direction:
rather than the features being postponed to the next GA release, they
are pushed out to SR1 (and SR2, etc.). That breaks the strong reading of
the SimRel schedule, really.

What I mean is that in the past the GA release was postponed by up to
three months to cope with "things just happening", with the hope being
that such events would become ever rarer. That has not generally
happened and today we have SimRel which is not time-flexible.

That time-flexibility meant that feature delivery problems would get
masked (i.e. you'd get 8-12 weeks more time in a particular cycle).

With that flexibility gone, though, there are only two options I can see:

1) the SimRel schedule works for a particular project
2) the SimRel schedule does not work for a particular project

I think we are dealing with a case of 2) here, and the question is whether:

a) SimRel schedule needs to be fixed
b) the project's upstreams need to be fixed
Yes, as mentioned above upstreams have to truly care about the downstreams. A simple change from an upstream is a pain to the downstream, but the downstream doesn't really get a say to ignore the change.
c) the project needs to be fixed
Yes, downstreams are bad. I would say much is related to not being in the upstream community on a regular basis and limited resources that are upstream. The resources do not align with the upstream simrel schedule. Some projects are in a maintenance mode and they are easy to keep on schedule. Others get large code dumps in chunks that are not regular. Depending on when the branches are cut it make s a mess. This is an area where we need to manage better - but again so hard when the resources are not consistent.

One final note: unlike MRI and self-managed projects, projects in
autorelease build have no control over how/when they consume upstream
changes nor when they release. I believe this is a major component of
the pain here.
Exactly. No control so you are constantly reacting. Most of the time you don't have the resources to react anyways, and things just get worse. So you end up trying to find ways to protect the project at earlier points - which means  pushing to the upstreams. Or asking for a stable master branch :)

Regards,
Robert


Robert Varga
 

On 11/02/2019 15:01, Sam Hague wrote:
We really need to address the 'wild west' aspect here -- and we need
concrete examples from past two releases.

Possible example, the sodium branch is broken for NetVirt because of the
karaf.shell missing. [1] was pushed to add the pom dependency. Possible
it was something else that caused the issue, but the point is that when
things go unstable and not fixed you get days or weeks of things leaking in.
[1] https://git.opendaylight.org/gerrit/#/c/80253/
Alright, this is a transitive dependency not being declared at point of
use and yeah, broken by genius correcting their use (moving out of API,
using scope=provided).

We explicitly do not guard against this kind of breakage, because that
would require a full autorelease build on each verify. It is caught by
autorelease, though.

What you can do on netvirt side is to clean up your build system to not
rely on transitives, like what bgpcep does:

https://github.com/opendaylight/bgpcep/blob/master/binding-parent/pom.xml#L60

It is by no means perfect and subject to breakage when things change
upstream, but I think that occurs only in case of what would be
considered an API change...

Regards,
Robert


Guillaume Lambert
 

Hi

 

Robert, I got more or less the same experience described by Sam. Last September, we cut the master branch from Fluorine to Neon.

https://git.opendaylight.org/gerrit/#/c/75746/

But one month later, we had to downgrade our dependencies because of too much runtime problems and upstream dependencies obviously not available.

https://git.opendaylight.org/gerrit/#/c/76794/

I tried to bump to Neon in December again by following the various emails you sent  about yangtools and ODLparent bump.

I got less issues but still a runtime problem with karaf without so much clue where to start digging.

https://git.opendaylight.org/gerrit/#/c/78458/3

I finally followed the platform reference given at https://docs.opendaylight.org/projects/integration-distribution/en/latest/platform-versions.html

All of that was really helpful but obviously not enough to get something working.

I finally got some feedback from the mdsal-dev mailing-list explaining that the models and the blueprint migration are not referenced in the docs but on the mdsal wiki…

https://lists.opendaylight.org/pipermail/tsc/2019-January/010942.html

 

 

Hope this helps

Guillaume

 

 

From: Sam Hague [mailto:shague@...]
Sent: lundi 11 février 2019 15:01
To: Robert Varga
Cc: LAMBERT Guillaume TGI/OLN; Stephen Kitt; tsc@...; netvirt-dev@...; discuss@...
Subject: Re: [OpenDaylight Discuss] [OpenDaylight TSC] Stable branches etc.

 

 

 

On Mon, Feb 11, 2019 at 8:19 AM Robert Varga <nite@...> wrote:

On 30/01/2019 15:30, Sam Hague wrote:
>
>
> On Wed, Jan 30, 2019 at 5:03 AM <guillaume.lambert@...
> <mailto:guillaume.lambert@...>> wrote:
>
>     Hi Stephen
>
>     I am sharing your feedback and thinks it would make a lot of sense.
>     Many linux distros use something similar to deal with staging
>     packages in their repo, e.g. Fedora with stable/branched/rawhide
>     repos or Debian with stable/testing/unstable repos.
>     With only one (master) branch, it is difficult for downstream
>     projects to deal with both the new features to develop and needed
>     migrations for the next release at the same time.
>     An intermediate branch may allow a better synchronization with the
>     upstream projects as long as ongoing evolution are made available
>     through nexus.
>
> This is a good point and was a similar reason for needing a branch. This
> has hit us every release where the stable branch is pulled and master
> goes forward, but downstreams still want to continue working. They can't
> since the stable branch is locked and master becomes the wild west.

We really need to address the 'wild west' aspect here -- and we need
concrete examples from past two releases.

Possible example, the sodium branch is broken for NetVirt because of the karaf.shell missing. [1] was pushed to add the pom dependency. Possible it was something else that caused the issue, but the point is that when things go unstable and not fixed you get days or weeks of things leaking in.


Based on the release schedule, the next release is not open, which
certainly is not a wild card to wreak havoc on downstreams -- so who and
why is causing it?!

The projects all work independent of each other with no protection so it is very easy for one project to break another. This typically happens right at this time when the stable branch is cut and then master is opened up. Projects start merging stuff. Not too bad when you are upstream but downstream you catch all the issues.

 

True, you could mitigate this by effectively stopping master development and focusing on the stable branch. The schedule doesn't allow this since it puts pressure on the downstreams to catch up later. Maybe a change would help here. The downstreams are the mercy of the upstreams though. I think better protection/verification and working together would be more effective, since today all that happens is pointing fingers at who caused a breakage.


> Some
> of this could be alleviated with more reliable planning - getting code
> in earlier and tested - but that is hard with limited resources. An
> intermediate branch would provide a place to keep working to finish
> things and make it into the sr1 and not try to cram something in at the
> last minute on the stable branch.

Well, I take the position that limited resources dictate limited code
churn and more incremental feature delivery. This includes the hard task
of culling deliverables early.

Agreed, we need to do a much better job of this. This is very hard when resources are not consistent so the planning is worthless. Nothing in the schedule accounts for corrections though, beyond dropping the feature. I think in times past this was something we tried to avoid since there were further downstreams wanting to consume the features and we cared.

 

I think neon will make it out on time. We pushed many features out to sodium. So the idea of culling does help considerably.


The additional branch really works in exactly the opposite direction:
rather than the features being postponed to the next GA release, they
are pushed out to SR1 (and SR2, etc.). That breaks the strong reading of
the SimRel schedule, really.

What I mean is that in the past the GA release was postponed by up to
three months to cope with "things just happening", with the hope being
that such events would become ever rarer. That has not generally
happened and today we have SimRel which is not time-flexible.

That time-flexibility meant that feature delivery problems would get
masked (i.e. you'd get 8-12 weeks more time in a particular cycle).

With that flexibility gone, though, there are only two options I can see:

1) the SimRel schedule works for a particular project
2) the SimRel schedule does not work for a particular project

I think we are dealing with a case of 2) here, and the question is whether:

a) SimRel schedule needs to be fixed
b) the project's upstreams need to be fixed

Yes, as mentioned above upstreams have to truly care about the downstreams. A simple change from an upstream is a pain to the downstream, but the downstream doesn't really get a say to ignore the change.

c) the project needs to be fixed

Yes, downstreams are bad. I would say much is related to not being in the upstream community on a regular basis and limited resources that are upstream. The resources do not align with the upstream simrel schedule. Some projects are in a maintenance mode and they are easy to keep on schedule. Others get large code dumps in chunks that are not regular. Depending on when the branches are cut it make s a mess. This is an area where we need to manage better - but again so hard when the resources are not consistent.


One final note: unlike MRI and self-managed projects, projects in
autorelease build have no control over how/when they consume upstream
changes nor when they release. I believe this is a major component of
the pain here.

Exactly. No control so you are constantly reacting. Most of the time you don't have the resources to react anyways, and things just get worse. So you end up trying to find ways to protect the project at earlier points - which means  pushing to the upstreams. Or asking for a stable master branch :)


Regards,
Robert

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.