Providing Continuous Integration at CERN

## Providing Continuous Integration at CERN #### General overview on how we setup a CI infra for CERN users ##### Daniel Juárez González - IT-CDA-WF --- <style> .bash {background: #327ec5 !important;} </style> ## Continuous Integration? > The practice of integrating code into a shared repository and building/testing each change automatically ```bash= image: docker:latest services: - docker:dind build: stage: build script: - docker build -t test . ``` * GitLab-runners * Jenkins --- ## But why? > Open source tools such as Jenkins and GitLab will allow rationalizing hardware resource allocation and administrative operations, providing improved software development workﬂow for developers, accelerating innovation cycles and increasing conﬁdence in new software deployments > [name=A Roadmap to Continuous Integration for ATLAS Software Development; ATLAS: J Elmsheuser et al 2017 J. Phys.: Conf. Ser. 898 072009] * We use GitLab CI for upgrading GitLab itself --- <style> img {border: none !important; box-shadow: none !important; background: none !important;} .number_evol p { width: 50%; float: left;} .number_list ul { font-size: smaller;} .nexus ul { font-size: smaller;} .jenkins ul { font-size: smaller;} </style> ## What we do <div style="font-size: 14px;">in the shadows</div> ###### Ease code best practices for CERN users * With: * Source code (GitLab) * Docker registry * Artifact storage (Nexus) * CI/CD with GitLab runners & Jenkins ![](https://codimd.web.cern.ch/uploads/upload_32b7ba44e2a507555f125df37329ebc4.png) --- ## Some numbers <div class="number_list"> * 15 000+ GitLab active users * ~100 000 monthly CI jobs * ~3500 monthly docker-build CI jobs * ~450 monthly privileged CI jobs * ~30 shared runners * ~300 private runners </div> <div class="number_evol"> ![](https://codimd.web.cern.ch/uploads/upload_0b99210ab938fc523f820cf5b4e43ab8.PNG =400x200) ![](https://codimd.web.cern.ch/uploads/upload_c6d6ee9ff77bad6fba32b463ed3a4a2f.PNG =400x200) </div> We will get back to why we divide CI jobs per type. --- ## CI/CD services we provide * A pool of GitLab-runners shared among all users: * General purpose runners * Docker-build runners * Privileged runners, i.e. root access to the machine * [Jenkins Openshift template](http://cern.ch/jenkinsdocs) * [Nexus artifact storage Openshift template](https://cern.service-now.com/service-portal/service-element.do?name=soft-comp-rep) --- <style> .evolution {position: absolute; top: -20%; right: 0px; width: 20%; } </style> ## GitLab shared-runners evolution <span class="evolution"> ![](https://codimd.web.cern.ch/uploads/upload_5f4e27ad3fe2ef3d70a052475dbff041.png) </span> <div style="text-align: left;"> Always looking for the best solutions to provide a better service to users. </div> * Old GitLab-runners were individual machines * This led to complicated Puppet hostgroup configuration. * Using our own Puppet module. We still provide support for CERN users, as this is the recommended deployment for private runners. But not the only one! * We had to manually deploy new machines one by one --- <style> tr img {max-width: 20% !important; vertical-align:middle !important;} tr td {width: 20% !important; font-size: 14px !important;} </style> ## Tested options |Option|Pros|Cons| |:-----|:--:|:--:| |![](https://codimd.web.cern.ch/uploads/upload_94769400d3c1953ab99c5d16f65fc7bb.png)Kubernetes|<ul><li>CERN's cloud support</li></ul>|<ul><li>Kubernetes executor did not have the same features as Docker's</li></ul>| |![](https://codimd.web.cern.ch/uploads/upload_351111a33b894794a792fb64d4486ef2.png) Openstack Magnum|<ul><li>CERN's cloud support</li></ul>|<ul><li>-</li></ul>| |![](https://codimd.web.cern.ch/uploads/upload_c35953f5ff36090afc9188c2965f9ac1.png) Docker-machine|<ul><li>CERN's cloud support</li><li>What gitlab.com does</li></ul>|<ul><li>Slow provisioning</li><li>Affected by Nova API downtimes</li><li>Cannot cache images</li></ul>| All solutions ease scaling --- ###### Selected alternatives * Openstack Magnum clusters - Swarm * Docker-machine All of them using Docker executor --- ## Current infrastructure ###### Problem: Privileged runners have direct access to the node where the job runs, so VMs need to be created to ++run just one job++ * 3 different types of GitLab shared-runners based on the user needs: * Default runners, unprivileged, CVMFS automounting * Docker-build runners, privileged, image-limited * Privileged runners, experimental, using docker-machine with CERN's [Nova service](https://docs.openstack.org/nova/latest/) * Private GitLab runners, set up by users with a Puppet module --- ## Normal shared runners * Deployed on a Docker-Swarm cluster: * Each cluster node works as an individual runner * Configured with an [Ansible Playbook](https://gitlab.cern.ch/vcs/gitlab-ci/swarm-deployer-gitlab-runner) * Unprivileged * [Side-container to automount CVMFS](https://gitlab.cern.ch/vcs/cvmfs-automounter) * [Docker image cleanup](https://gitlab.cern.ch/vcs/gitlabci-docker-cleanup) --- ## Docker-build shared runners * Same as Normal type, but running as privileged, required to do Docker-build * Job image is [provided by us](https://gitlab.cern.ch/ci-tools/docker-image-builder) and cannot be changed * Users cannot do anything but Docker builds to avoid taking advantage of running as privileged * User can only define a [limited set of variables](https://gitlab.cern.ch/ci-tools/docker-image-builder#controlling-the-behavior-of-the-build) * Jobs will not run if users try to use a different image * Also deployed with an Ansible Playbook. --- ## Advantages of using Openstack Magnum * Deployment spawns multiple runners at once * Upgrading: delete and recreate the cluster * Configuration is way simpler than with Puppet-managed machines * Nodes are already preconfigured --- ## Privileged shared runners (Beta) * Some [use cases were not provided by Standard and Docker-build runners](https://gitlab.cern.ch/gitlabci-examples/demo-privileged-runners) * This type has improved its performance a lot since we deployed them for the first time (started with nested VMs) * Using our [custom GitLab runner](https://gitlab.cern.ch/vcs/gitlab-ci/privileged-gitlab-runner) with latest changes from <https://github.com/docker/machine> (we require some yet unreleased features) * Spawning VMs using CERN's [Openstack Nova service](https://docs.openstack.org/nova/latest/), deleted after the job finishes * Using metadata to avoid CERN's DNS registration, and speed up VM creation to 1-2 minutes (instead of 15 minutes) --- ## Other services ### GitLab Registry * Docker registry, integrated with our GitLab instance * Uses CERN's S3 storage service ### Artifact Storage * Integrated with our GitLab runners * Uses CERN's S3 storage service --- ## Jenkins Openshift template <div class="jenkins"> * The use cases of some of our users cannot be covered by GitLab CI * GitLab CI does not allow to lock CI configuration * Run CI on merged code * Historical reasons, some teams were using this solution already * ~60 Instances * Instance owners have full control of their instances. Jenkins is not multitenant. * Provided as a template for our Openshift service ([Platform as a Service](http://information-technology.web.cern.ch/services/PaaS-Web-App)), preconfiguration and deployment is automated. </div> ![](https://codimd.web.cern.ch/uploads/upload_57f7c4bda103139eb9d0708f49397f1a.png) --- ## Nexus artifact storage <div class="nexus"> * Teams were already using Artifactory/Nexus. This solution allows us to manage updates and configure SSO. * Store binary artifacts for the main programming languages: Maven (Java), Pypi (Python), NPM (node.js) etc. Complements Linuxsoft already providing this functionality for RPM packages. * Use cases: * Store and distribute components to build and deploy complex software projects * Keep a local copy of external libraries to ensure a given version remains available for future builds and make downloads faster * Reporting on what libraries/versions are used in a given project (and vulnerable ones) </div> --- ## Future plans * Improve automation: * Monitoring to make the system recover to problems automatically * Autoscaling so we adapt dinamically to CI load * Improve provisioning time for privileged runners * Try to have a GitLab-runner infrastructure as close as possible to gitlab.com's, reducing the number of runner types * Investigate [unprivileged Docker-builds with Kaniko](https://docs.gitlab.com/ee/ci/docker/using_kaniko.html) --- ## Questions? ![](https://codimd.web.cern.ch/uploads/upload_2b36eaed5400a653e8f70cc8a20a2042.png)