Recent change to CI management
Yesterday we merged a major change to how Jenkins is managed. Unfortunately, due to how the change had to go in, it was a massive review and a few things got missed during it causing some issues over the last ~12 hours or so as we worked that all out.
From what I'm seeing we're back on track for the environment to be working.
So, what's changed? The way we do the managed config files, that is, all the files that are stored in Jenkins with credentials for use in the jobs.
Historically, this have been hand managed, or as we rolled out our self-service bits, we've had some scripting in place to push in the standard configuration files as new repositories were built.
There's been several downsides to this system:
1) The community had no idea what was actually in the files
2) The community has had no easy way to influence what the files are
3) It was very easy for something to be applied to the production system, but not the sandbox system
4) It was very easy for things to be misconfigured
Late last year we started work on moving to Jenkins Configuration as Code (JCasC) as much as possible for things. Internally at LF, that has meant better puppet based management of Jenkins for non-community tunables, but it didn't completely resolve other areas that are supported by JCasC but would be something that the community would like to at least know how it was configured, or even influence it.
As such, when we were designing our puppet conversion to using JCasC we made sure to make it possible to also incorporate community managed bits.
So, here's what this means to the community:
The environment variables and cloud configuration continue to be managed the same way (files in the jenkins-config directory of releng/builder), the way that those get pushed into Jenkins has changed though and we now take those files and convert them into JCasC yaml and put them where they belong on system.
At the same time, we've also written a system to allow the community to influence the managed config files . You will find in the jenkins-config directory another couple of directories:
These are the directories and files that are now driving the manged config files. They were also the cause of why the system wasn't working so well after the merge. This is due to how JCasC operates. It works in an all-or-nothing setup for the parts of Jenkins that it takes under management. As such, this meant we had to port _all_ of the managed files in at the exact same time, generating a very, very large change that ended up causing the reviews to miss on parts. We should be all good now after having worked through the issues.
This new system has a few benefits and one negative:
* The configuration, upon merge, will be applied within 5 minutes.
* The community now has better visibility into how the systems are managed and can influence them
* We have fully versioned configuration for the parts under management this way, making it easier to roll back a problem change, or even (re)build a new system more quickly
* The cloud configuration is now _less_ brittle than when we were using the jenkins-cfg-merge job which then translated our cloud configuration files into the needed format to be inserted via a on-the-fly built groovy script. This tended to be very brittle and minor version changes of the OpenStack cloud plugin could easily break our system. JCasC is both more, and less forgiving in this respect as it's a yaml rendering of how the parts are configured and is far less concerned with object design.
* We now also support Nomad clouds (this was built for FD.io) and will in the future also provide community support for static builder management as well as AWS / EC2 cloud management
The jenkins-cfg-merge job has been removed from the system as it conflicts with using JCasC management. This means that failures in the configuration being applied will not be visible to the community and only to LF staff that receive the cron job output (yes, cron job) that is actually applying the changes. We believe that this minor negative is far outweighed by the speed of which these configurations get applied and the ability for the community to actually see how things are configured and provide input on them.
Andrew J Grimberg
Manager Release Engineering
The Linux Foundation