Date
1 - 5 of 5
global-jjb vs. packer vs. Jenkins jobs
Robert Varga
Hello everyone,
as the (still current) failure to start Jenkins jobs shows, our current way of integrating with external dependencies (global-jjb) is beyond fragile. The way our jobs work is that: 1) we have a base image, created by builder-packer-* jobs on a regular basis and roll up distro upgrades plus some other things (like mininet, etc.) that we need 2) the Jenkins job launches on that base image and call two scripts from global-jjb, both of which end up installing more things: a) python-tools-install.sh b) lf-env.sh 3) the actual job runs 4) some more stuff invoking lf-env.sh to setup another Python environment runs. Now, it is clear that everything in 1) is invariant and updated in a controlled way. The problem is with 2), where again, everything is supposed to be invariant for a particular version of global-jjb -- yet we reinstall these things on every single job run. Not only is this subject to random breakage (like now, or when pip repositories are unavailable), etc. It also takes around 3 minutes of each job execution, which does not sound like much, but it is full 30%(!) of runtime of yangtools-release-merge (which takes around 10 minutes). We obviously can and must do better: global-jjb's environment-impacting scripts must all be executed during builder-packer, so that they become proper invariants. For that, global-jjb needs to grow two things: 1) a way to install *all* of its dependencies without doing anything else, for use in packer jobs 2) compatibility checks on the environment to ensure it is uptodate enough to run a particular global-jjb version's scripts With that, our jobs should be both faster and more reliable. Does anybody see a problem why this would not work? If not, I will be filing LFIT issues to get this done. Regards, Robert |
|
Anil Belur
Greetings Robert: lf-env.sh: Creates a virtual env and sets up the environment, while the python-tools-install.sh Installs the python tools/utils during Job runtime. Since releng/global-jjb is a repo of Generic JJB templates (can be used by any of the CI management repositories), its up to the $project/$job to install the dependencies required for running the job. We have discussed this in the past, installing PyPI dependencies during packer image build time, comes with its own set of problems and added costs: 1. This requires maintaining a large number of packer images (if the project needs to support multiple versions of python/PyPI deps). 2. All releng/global-jjb (templates) scripts do not require all of the PyPi dependencies to be installed and are tied down to the $job or $project, since this approach binding them all into the same env has a risk of the deps being broken more frequently. 3. PyPi libs/modules are updated more frequently. Thanks, Anil On Mon, Jan 25, 2021 at 7:44 PM Robert Varga <nite@...> wrote: Hello everyone, |
|
Robert Varga
On 03/02/2021 00:03, Anil Belur wrote:
Greetings Robert:Hello Anil, lf-env.sh: Creates a virtual env and sets up the environment, while theUnderstood. At the end of the day, though, we have only a few classes of jobs and there is a ton of commonalities between them. We have discussed this in the past, installing PyPI dependencies duringI do not believe this is the case for OpenDaylight jobs. For example each and every job I looked at performs two things: - python-tools-install.sh (70 seconds) - job-cost.sh (39 seconds) 2. All releng/global-jjb (templates) scripts do not require all of theWhile that is true, this line of reasoning completely ignores the failure mode and recovery. As it stands any of: - busted global-jjb - PyPi package updates - PyPi repository unavailability As we have seen in these past weeks, any such failure immediately propagates to all jobs and breaks them -- resulting in nothing working anymore, with no real avenue for recovery without help of LF IT. We actually went through exactly this discussion when we had Sigul failures -- and Sigul is now part of base images. It is deemed sufficient to update our cloud images once a month -- and that includes all sorts security fixes and similar. As a community we are free to decide when to spin new images and can do that completely without LF IT intervention. I am sorry, but I fail to see how Python packages special enough to inflict: - breakages occurring at completely random times - incur 2-5 minutes of infra install to *each and every job* we run[*] I am sorry to say that the world has changed in the past 5 years and we no longer have the attention of LF IT staff that made resolution of these failures a matter of hours -- it really is multiple days. That fact alone makes a huge difference when weighing pros and cons. Regards, Robert [*] Just take a good look at what https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/aaa-maven-verify-master-mvn35-openjdk11/3/console-timestamp.log.gz did: Total job runtime: 9m56s Useful build time: 7m16s Setup/teardown time: 2m40s That's **27%** of the time spent on infra, amounting to **37%** overhead.
|
|
Robert Varga
On 05/02/2021 10:16, Robert Varga wrote:
On 03/02/2021 00:03, Anil Belur wrote:Hello again,Greetings Robert:Hello Anil, sorry, for self-reply, but as it happens ... [snip] ... we just got hit by this.2. All releng/global-jjb (templates) scripts do not require all of theWhile that is true, this line of reasoning completely ignores the [snip] A case in point: https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/yangtools-maven-verify-master-mvn35-openjdk11/3761/console.log.gz just failed with: [yangtools-maven-verify-master-mvn35-openjdk11] $ /bin/bash /tmp/jenkins2774821607907773560.sh[...] writing manifest file 'src/cryptography.egg-info/SOURCES.txt'python-tools-install.sh comes from global-jjb. This means jobs are currently broken because of global-jjb stopped working. I have filed https://jira.linuxfoundation.org/plugins/servlet/theme/portal/2/IT-21509 and we are blocked on it. Let's see what sort KPIs that issue will have. Bye, Robert |
|
Anil Belur
Greetings Robert: We'll need to pin the cryptography module < 3.4 since rust dependencies are broken upstream pyca repo. This issue should be addressed once this is merged. Cheers, Anil On Mon, Feb 8, 2021 at 6:58 AM Robert Varga <nite@...> wrote: On 05/02/2021 10:16, Robert Varga wrote: |
|