<style> .present { text-align: left } .present h2 { text-align: center } </style> # Kubernetes operators for web hosting at CERN Hepix fall 2022 Alex Lossent --- ## Outline - Web hosting infrastructure: - Vision - Backbone - Skin - Heart - Timeline & future plans --- ## The Vision ![](https://codimd.web.cern.ch/uploads/upload_921ab1ef1850ac5d3686ca709a2f3ba3.png) [copyright - CC BY-SA 3.0](https://en.wikipedia.org/wiki/Eye#/media/File:Calliphora_vomitoria_Portrait.jpg) ---- ### Legacy webservices portal (since 2004!) ![](https://codimd.web.cern.ch/uploads/upload_3ab0ac3ca0d08befb6abcde7da8667f9.png) ---- ### Strategy (2018) "strategy for web hosting at CERN is to provide a range of solutions fulfilling the diverse needs of the community, from simple/static sites to content management systems (Drupal) to complex web applications (Platform-as-a-Service) using a shared infrastructure (Kubernetes/OpenShift)" ---- ### Why discontinue the old portal? - MALT considerations: - webservices portal & site management logic (`ASP.NET`) - web site lifecycle (`FIM`) - Windows IIS web hosting (`DFS`) - replace SharePoint with Application Templates: Discourse, Wordpress... - Move 10k+ sites/apps to new SSO - 20-year old design choices and hardcoded behaviors (DNS domains...) ---- ### Plan new architecture (end 2019) - Common container-based infrastructure for all types of site - Kubernetes Operators to implement site management logic - Integrate with new [Application Portal](https://application-portal.web.cern.ch) for new SSO and lifecycle of user applications - New web user portal front-end - design goal: provide more guidance to users ---- ### Scope Use this new infra for: - static/CGI web site hosting (webeos, gitlab pages) - Drupal sites - replacement of Openshift 3 PaaS - custom applications - application templates --- ## The Backbone ![](https://codimd.web.cern.ch/uploads/upload_3528d2d80f4bf2ad8779572422864079.png) Public Domain ---- ## OKD4 infrastructure - OKD4 Openshift Kubernetes Distribution - Adds multi-tenancy features to Kubernetes - PaaS: nice automation for custom apps, stronger security/isolation - 4 production OKD4 clusters for the different use cases - Sharing a common design and infrastructure ---- ### Webeos (3700 projects) Serve static/CGI web sites - from user-provided EOS folders (e.g. `/eos/user/a/alossent/www`) - documentation sites based on GitLab Pages ---- ### Drupal (850 projects) Content Management System - Official content: CERN homepage... - Help site owners with Drupal version updates - Support the large ecosystem of modules ---- ### PaaS (1250 projects) Host users' custom web applications - deploy from upstream or custom Docker images - S2I: build applications from code (PHP, Python, NodeJS...) - high level of integration with CERN computing environment: SSO, storage (EOS, CVMFS, CephFS), DNS, firewall, TN integration... - very large applications out of scope: use a dedicated Kubernetes cluster (Openstack Magnum) ---- ### App-Catalogue (260 projects) Provide a catalogue of self-service application templates - Grafana, Sentry, Nexus... - MALT: Wordpress, Discourse... ---- ### Common infrastructure - One infra, multiple clusters - use cases: webeos, drupal, paas, app-catalogue - environments: prod, staging, dev, CI - decentralized: each cluster is 100% self-sufficient - Clusters managed with gitops - ArgoCD, Helm charts - end-to-end tests for every change ---- ### CERN computing environment Made OKD4 work at CERN using: - OKD4 customization (Openstack support...) - Extra upstream components: OPA, ExternalDNS, velero, restic... - Shared components with Kubernetes team: EOS, CVMFS, CephFS... - New development to integrate with LanDB, SSO... --- ## The Skin ![](https://codimd.web.cern.ch/uploads/upload_daf2450f3d6f7955f5617ce5ce905fe0.png) [copyright - CC BY-SA 2.0](https://en.wikipedia.org/wiki/Skin#/media/File:Elephant_Skin.jpg) ---- ### New webservices portal ![](https://codimd.web.cern.ch/uploads/upload_e535e6bf0d3e185017c3ecaeae5925cb.png) ---- ### My sites ![](https://codimd.web.cern.ch/uploads/upload_ad2aa97f5ffece66bcc63271366ec7aa.png) ---- ### Site management ![](https://codimd.web.cern.ch/uploads/upload_21c8e263b4da150dff1f4c359969ef67.png) ---- ### Behind the scenes - Each web site/application is an OKD4 project (= Kubernetes namespace) in one OKD4 cluster - 4 production clusters: webeos, paas, drupal, app-catalogue - The portal is essentially a stateless front-end for the Kubernetes API - Aggregated view of owned projects in all clusters - Kubernetes operators implement all the logic for site provisioning, configuration changes ---- ### Application Portal integration Each OKD4 project registers itself into the Application Portal. The Application Portal provides: - SSO registration for each web site/application - Management of application roles - Lifecycle, e.g. what happens when the owner leaves CERN _The application in the Application Portal is the actual computing resource you own._ --- ## The Heart ![](https://codimd.web.cern.ch/uploads/upload_f0306bf1e892eecac7e8e32679bc226f.png) [copyright - CC BY 2.5](https://en.wikipedia.org/wiki/Heart#/media/File:Heart_anterior_exterior_view.jpg) ---- ### Kubernetes operators Software extensions to Kubernetes, following the same design principle as Kubernetes itself - Declarative API: operators extend the Kubernetes API with custom resource types (e.g. `"DrupalSite"`) that describe _what we want_ - A _control loop_ implements reconciliation logic to enforce the desired state ---- ### DrupalSite example ```yaml apiVersion: drupal.webservices.cern.ch/v1alpha1 kind: DrupalSite name: home spec: configuration: databaseClass: critical diskSize: 10Gi qosClass: critical scheduledBackups: enabled siteUrl: - home.web.cern.ch - home.cern - press.cern - ... version: name: v9.4-1 releaseSpec: RELEASE-2022.09.29T12-31-15Z status: availableBackups: - backupName: home-281d-20221024024033 date: '2022-10-24T11:53:45Z' drupalSiteName: home-cleanup-lenient expires: '2022-11-07T11:51:05Z' - ... ``` ---- ### Drupal operator control loop - Create container deployments, database, DNS records, SSO... - Manage site configuration - Automate tasks: clone sites, upgrades, backup/restore... - Leverage Kubernetes ecosystem, e.g. - `Tekton` to perform async tasks - `Velero` for resource backup ---- ### Operators for site types Every hosted site/application is described by a Kubernetes resource inside an OKD4 project, e.g.: - `UserProvidedDirectory` describes a webeos site - `GitlabPagesSite`, `DrupalSite`, `WordPress`, `Grafana`... Each resource type has an associated operator. The operator's control loop brings the site/application to the desired state. ---- ### Web portal The new web portal provides the UI to edit these resources ![](https://codimd.web.cern.ch/uploads/upload_f12a5c6a6c761ccc747e6c5d64f5dd64.png) ---- ### Infrastructure operators (examples) - LanDB operator - Custom Resources: `LandbSet`, `DelegatedDomain`... - App Portal operator - Manages SSO registration and lifecycle for user applications - Custom Resources: `ApplicationRegistration`, `ProjectLifecyclePolicy`... ---- ### Operator - languages Operator SDK - Application templates: "Helm operators" - Go for the rest ---- ### Policies Policy enforcement by CNCF Open Policy Agent, e.g. - unique hostname across all clusters - security team approvals (TN visibility...) - automation of EOS mounts, publication of hostnames in DNS... --- ## Timeline & future plans ![](https://codimd.web.cern.ch/uploads/upload_b9a670e7f4d0ff1c7456e5c948ff124f.png) [Public domain](https://www.publicdomainpictures.net/en/view-image.php?image=276020&picture=vintage-pocket-watch) ---- ```mermaid gantt title OKD4 project timeline section OKD4 infra OKD4 cluster installation :a1, 2020-02-01, 35d SSO integration :a2, 2020-02-01, 2020-07-30 section Webeos Webeos site operator :b1, 2020-02-01, 2020-05-15 Prepare migration :b2, after b1, 2020-10-07 Migrate from SLC6 VMs :after b2, 2020-11-27 section Drupal Drupal site operator :d1, 2020-10-09, 2021-07-09 Migration :2021-07-21, 2021-11-30 section PaaS/App-Catalogue Deploy PaaS/App-Catalogue OKD4 :c1, 2021-01-07, 2021-02-28 PaaS OKD4 pilot :c2, after c1, 2021-09-01 PaaS Migration :after c2, 2022-07-01 ``` ---- ### PaaS - plans Usability improvements - gitops for user applications - logging, monitoring services for user apps ---- ### WebEOS - plans Managed EOS folders - Provision & manage workspaces under `/eos/web` - UI to manage folder properties (access control, CGI...) Goal: facilitate migration of web sites hosted on DFS, AFS ---- ### Consolidate remaining site types - Migrate web sites still managed by legacy web services: - WebAFS => WebEOS - WebDFS (Windows IIS) => WebEOS (static, PHP), PaaS (dotnet) - SharePoint => SharePoint Online --- ## Conclusion - New architecture for web hosting services - Built around OKD4 and Kubernetes Operators - Supports the range of hosting services: simple/static sites to content management to complex containerized web apps - Kubernetes operator model: fit for use - Collection of small software development projects interacting trough K8s API - Composition, reusability
{"tags":"Presentation, Hepix","type":"slide","slideOptions":{"transition":"fade","theme":"cern6"}}