<style>
.present { text-align: left }
.present h2 { text-align: center }
</style>
# Kubernetes operators for web hosting at CERN
Hepix fall 2022
Alex Lossent
---
## Outline
- Web hosting infrastructure:
- Vision
- Backbone
- Skin
- Heart
- Timeline & future plans
---
## The Vision
![](https://codimd.web.cern.ch/uploads/upload_921ab1ef1850ac5d3686ca709a2f3ba3.png)
[copyright - CC BY-SA 3.0](https://en.wikipedia.org/wiki/Eye#/media/File:Calliphora_vomitoria_Portrait.jpg)
----
### Legacy webservices portal (since 2004!)
![](https://codimd.web.cern.ch/uploads/upload_3ab0ac3ca0d08befb6abcde7da8667f9.png)
----
### Strategy (2018)
"strategy for web hosting at CERN is to provide a range of solutions fulfilling the diverse needs of the community, from simple/static sites to content management systems (Drupal) to complex web applications (Platform-as-a-Service) using a shared infrastructure (Kubernetes/OpenShift)"
----
### Why discontinue the old portal?
- MALT considerations:
- webservices portal & site management logic (`ASP.NET`)
- web site lifecycle (`FIM`)
- Windows IIS web hosting (`DFS`)
- replace SharePoint with Application Templates: Discourse, Wordpress...
- Move 10k+ sites/apps to new SSO
- 20-year old design choices and hardcoded behaviors (DNS domains...)
----
### Plan new architecture (end 2019)
- Common container-based infrastructure for all types of site
- Kubernetes Operators to implement site management logic
- Integrate with new [Application Portal](https://application-portal.web.cern.ch) for new SSO and lifecycle of user applications
- New web user portal front-end
- design goal: provide more guidance to users
----
### Scope
Use this new infra for:
- static/CGI web site hosting (webeos, gitlab pages)
- Drupal sites
- replacement of Openshift 3 PaaS
- custom applications
- application templates
---
## The Backbone
![](https://codimd.web.cern.ch/uploads/upload_3528d2d80f4bf2ad8779572422864079.png)
Public Domain
----
## OKD4 infrastructure
- OKD4 Openshift Kubernetes Distribution
- Adds multi-tenancy features to Kubernetes
- PaaS: nice automation for custom apps, stronger security/isolation
- 4 production OKD4 clusters for the different use cases
- Sharing a common design and infrastructure
----
### Webeos (3700 projects)
Serve static/CGI web sites
- from user-provided EOS folders (e.g. `/eos/user/a/alossent/www`)
- documentation sites based on GitLab Pages
----
### Drupal (850 projects)
Content Management System
- Official content: CERN homepage...
- Help site owners with Drupal version updates
- Support the large ecosystem of modules
----
### PaaS (1250 projects)
Host users' custom web applications
- deploy from upstream or custom Docker images
- S2I: build applications from code (PHP, Python, NodeJS...)
- high level of integration with CERN computing environment: SSO, storage (EOS, CVMFS, CephFS), DNS, firewall, TN integration...
- very large applications out of scope: use a dedicated Kubernetes cluster (Openstack Magnum)
----
### App-Catalogue (260 projects)
Provide a catalogue of self-service application templates
- Grafana, Sentry, Nexus...
- MALT: Wordpress, Discourse...
----
### Common infrastructure
- One infra, multiple clusters
- use cases: webeos, drupal, paas, app-catalogue
- environments: prod, staging, dev, CI
- decentralized: each cluster is 100% self-sufficient
- Clusters managed with gitops
- ArgoCD, Helm charts
- end-to-end tests for every change
----
### CERN computing environment
Made OKD4 work at CERN using:
- OKD4 customization (Openstack support...)
- Extra upstream components: OPA, ExternalDNS, velero, restic...
- Shared components with Kubernetes team: EOS, CVMFS, CephFS...
- New development to integrate with LanDB, SSO...
---
## The Skin
![](https://codimd.web.cern.ch/uploads/upload_daf2450f3d6f7955f5617ce5ce905fe0.png)
[copyright - CC BY-SA 2.0](https://en.wikipedia.org/wiki/Skin#/media/File:Elephant_Skin.jpg)
----
### New webservices portal
![](https://codimd.web.cern.ch/uploads/upload_e535e6bf0d3e185017c3ecaeae5925cb.png)
----
### My sites
![](https://codimd.web.cern.ch/uploads/upload_ad2aa97f5ffece66bcc63271366ec7aa.png)
----
### Site management
![](https://codimd.web.cern.ch/uploads/upload_21c8e263b4da150dff1f4c359969ef67.png)
----
### Behind the scenes
- Each web site/application is an OKD4 project (= Kubernetes namespace) in one OKD4 cluster
- 4 production clusters: webeos, paas, drupal, app-catalogue
- The portal is essentially a stateless front-end for the Kubernetes API
- Aggregated view of owned projects in all clusters
- Kubernetes operators implement all the logic for site provisioning, configuration changes
----
### Application Portal integration
Each OKD4 project registers itself into the Application Portal.
The Application Portal provides:
- SSO registration for each web site/application
- Management of application roles
- Lifecycle, e.g. what happens when the owner leaves CERN
_The application in the Application Portal is the actual computing resource you own._
---
## The Heart
![](https://codimd.web.cern.ch/uploads/upload_f0306bf1e892eecac7e8e32679bc226f.png)
[copyright - CC BY 2.5](https://en.wikipedia.org/wiki/Heart#/media/File:Heart_anterior_exterior_view.jpg)
----
### Kubernetes operators
Software extensions to Kubernetes, following the same design principle as Kubernetes itself
- Declarative API: operators extend the Kubernetes API with custom resource types (e.g. `"DrupalSite"`) that describe _what we want_
- A _control loop_ implements reconciliation logic to enforce the desired state
----
### DrupalSite example
```yaml
apiVersion: drupal.webservices.cern.ch/v1alpha1
kind: DrupalSite
name: home
spec:
configuration:
databaseClass: critical
diskSize: 10Gi
qosClass: critical
scheduledBackups: enabled
siteUrl:
- home.web.cern.ch
- home.cern
- press.cern
- ...
version:
name: v9.4-1
releaseSpec: RELEASE-2022.09.29T12-31-15Z
status:
availableBackups:
- backupName: home-281d-20221024024033
date: '2022-10-24T11:53:45Z'
drupalSiteName: home-cleanup-lenient
expires: '2022-11-07T11:51:05Z'
- ...
```
----
### Drupal operator control loop
- Create container deployments, database, DNS records, SSO...
- Manage site configuration
- Automate tasks: clone sites, upgrades, backup/restore...
- Leverage Kubernetes ecosystem, e.g.
- `Tekton` to perform async tasks
- `Velero` for resource backup
----
### Operators for site types
Every hosted site/application is described by a Kubernetes resource inside an OKD4 project, e.g.:
- `UserProvidedDirectory` describes a webeos site
- `GitlabPagesSite`, `DrupalSite`, `WordPress`, `Grafana`...
Each resource type has an associated operator.
The operator's control loop brings the site/application to the desired state.
----
### Web portal
The new web portal provides the UI to edit these resources
![](https://codimd.web.cern.ch/uploads/upload_f12a5c6a6c761ccc747e6c5d64f5dd64.png)
----
### Infrastructure operators (examples)
- LanDB operator
- Custom Resources: `LandbSet`, `DelegatedDomain`...
- App Portal operator
- Manages SSO registration and lifecycle for user applications
- Custom Resources: `ApplicationRegistration`, `ProjectLifecyclePolicy`...
----
### Operator - languages
Operator SDK
- Application templates: "Helm operators"
- Go for the rest
----
### Policies
Policy enforcement by CNCF Open Policy Agent, e.g.
- unique hostname across all clusters
- security team approvals (TN visibility...)
- automation of EOS mounts, publication of hostnames in DNS...
---
## Timeline & future plans
![](https://codimd.web.cern.ch/uploads/upload_b9a670e7f4d0ff1c7456e5c948ff124f.png)
[Public domain](https://www.publicdomainpictures.net/en/view-image.php?image=276020&picture=vintage-pocket-watch)
----
```mermaid
gantt
title OKD4 project timeline
section OKD4 infra
OKD4 cluster installation :a1, 2020-02-01, 35d
SSO integration :a2, 2020-02-01, 2020-07-30
section Webeos
Webeos site operator :b1, 2020-02-01, 2020-05-15
Prepare migration :b2, after b1, 2020-10-07
Migrate from SLC6 VMs :after b2, 2020-11-27
section Drupal
Drupal site operator :d1, 2020-10-09, 2021-07-09
Migration :2021-07-21, 2021-11-30
section PaaS/App-Catalogue
Deploy PaaS/App-Catalogue OKD4 :c1, 2021-01-07, 2021-02-28
PaaS OKD4 pilot :c2, after c1, 2021-09-01
PaaS Migration :after c2, 2022-07-01
```
----
### PaaS - plans
Usability improvements
- gitops for user applications
- logging, monitoring services for user apps
----
### WebEOS - plans
Managed EOS folders
- Provision & manage workspaces under `/eos/web`
- UI to manage folder properties (access control, CGI...)
Goal: facilitate migration of web sites hosted on DFS, AFS
----
### Consolidate remaining site types
- Migrate web sites still managed by legacy web services:
- WebAFS => WebEOS
- WebDFS (Windows IIS) => WebEOS (static, PHP), PaaS (dotnet)
- SharePoint => SharePoint Online
---
## Conclusion
- New architecture for web hosting services
- Built around OKD4 and Kubernetes Operators
- Supports the range of hosting services: simple/static sites to content management to complex containerized web apps
- Kubernetes operator model: fit for use
- Collection of small software development projects interacting trough K8s API
- Composition, reusability
{"tags":"Presentation, Hepix","type":"slide","slideOptions":{"transition":"fade","theme":"cern6"}}