global-jjb vs. packer vs. Jenkins jobs
as the (still current) failure to start Jenkins jobs shows, our current
way of integrating with external dependencies (global-jjb) is beyond
The way our jobs work is that:
1) we have a base image, created by builder-packer-* jobs on a regular
basis and roll up distro upgrades plus some other things (like mininet,
etc.) that we need
2) the Jenkins job launches on that base image and call two scripts from
global-jjb, both of which end up installing more things:
3) the actual job runs
4) some more stuff invoking lf-env.sh to setup another Python
Now, it is clear that everything in 1) is invariant and updated in a
The problem is with 2), where again, everything is supposed to be
invariant for a particular version of global-jjb -- yet we reinstall
these things on every single job run.
Not only is this subject to random breakage (like now, or when pip
repositories are unavailable), etc.
It also takes around 3 minutes of each job execution, which does not
sound like much, but it is full 30%(!) of runtime of
yangtools-release-merge (which takes around 10 minutes).
We obviously can and must do better: global-jjb's environment-impacting
scripts must all be executed during builder-packer, so that they become
For that, global-jjb needs to grow two things:
1) a way to install *all* of its dependencies without doing anything
else, for use in packer jobs
2) compatibility checks on the environment to ensure it is uptodate
enough to run a particular global-jjb version's scripts
With that, our jobs should be both faster and more reliable.
Does anybody see a problem why this would not work?
If not, I will be filing LFIT issues to get this done.
lf-env.sh: Creates a virtual env and sets up the environment, while the python-tools-install.sh Installs the python tools/utils during Job runtime. Since releng/global-jjb is a repo of Generic JJB templates (can be used by any of the CI management repositories), its up to the $project/$job to install the dependencies required for running the job.
We have discussed this in the past, installing PyPI dependencies during packer image build time, comes with its own set of problems and added costs:
1. This requires maintaining a large number of packer images (if the project needs to support multiple versions of python/PyPI deps).
2. All releng/global-jjb (templates) scripts do not require all of the PyPi dependencies to be installed and are tied down to the $job or $project, since this approach binding them all into the same env has a risk of the deps being broken more frequently.
3. PyPi libs/modules are updated more frequently.
On Mon, Jan 25, 2021 at 7:44 PM Robert Varga <nite@...> wrote:
On 03/02/2021 00:03, Anil Belur wrote:
Greetings Robert:Hello Anil,
lf-env.sh: Creates a virtual env and sets up the environment, while theUnderstood. At the end of the day, though, we have only a few classes of
jobs and there is a ton of commonalities between them.
We have discussed this in the past, installing PyPI dependencies duringI do not believe this is the case for OpenDaylight jobs. For example
each and every job I looked at performs two things:
- python-tools-install.sh (70 seconds)
- job-cost.sh (39 seconds)
2. All releng/global-jjb (templates) scripts do not require all of theWhile that is true, this line of reasoning completely ignores the
failure mode and recovery.
As it stands any of:
- busted global-jjb
- PyPi package updates
- PyPi repository unavailability
As we have seen in these past weeks, any such failure immediately
propagates to all jobs and breaks them -- resulting in nothing working
anymore, with no real avenue for recovery without help of LF IT.
We actually went through exactly this discussion when we had Sigul
failures -- and Sigul is now part of base images.
It is deemed sufficient to update our cloud images once a month -- and
that includes all sorts security fixes and similar. As a community we
are free to decide when to spin new images and can do that completely
without LF IT intervention.
I am sorry, but I fail to see how Python packages special enough to inflict:
- breakages occurring at completely random times
- incur 2-5 minutes of infra install to *each and every job* we run[*]
I am sorry to say that the world has changed in the past 5 years and we
no longer have the attention of LF IT staff that made resolution of
these failures a matter of hours -- it really is multiple days. That
fact alone makes a huge difference when weighing pros and cons.
Just take a good look at what
Total job runtime: 9m56s
Useful build time: 7m16s
Setup/teardown time: 2m40s
That's **27%** of the time spent on infra, amounting to **37%** overhead.
On 05/02/2021 10:16, Robert Varga wrote:
On 03/02/2021 00:03, Anil Belur wrote:Hello again,Greetings Robert:Hello Anil,
sorry, for self-reply, but as it happens ...
... we just got hit by this.2. All releng/global-jjb (templates) scripts do not require all of theWhile that is true, this line of reasoning completely ignores the
A case in point:
just failed with:
[yangtools-maven-verify-master-mvn35-openjdk11] $ /bin/bash /tmp/jenkins2774821607907773560.sh[...]
writing manifest file 'src/cryptography.egg-info/SOURCES.txt'python-tools-install.sh comes from global-jjb. This means jobs are
currently broken because of global-jjb stopped working.
I have filed
and we are blocked on it. Let's see what sort KPIs that issue will have.
We'll need to pin the cryptography module < 3.4 since rust dependencies are broken upstream pyca repo.
This issue should be addressed once this is merged.
On Mon, Feb 8, 2021 at 6:58 AM Robert Varga <nite@...> wrote:
On 05/02/2021 10:16, Robert Varga wrote: