139 views
DIRAC v9 migration === :::danger NOT UPDATED ::: :::info :mega: :mega: :mega: :mega: :mega: The content of this note is moved to the DIRAC wiki : https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-9.0 ::: :::danger NOT UPDATED ::: ### skeleton document for migration ## PRE ### Things that have to be done in v8, before you even start considering a migration - [ ] Install the latest DIRAC v8 - [ ] The SecurityLogging Service is not anymore used by default: follow one of these 2 options: - If you use the [centralized logging](https://dirac.readthedocs.io/en/latest/AdministratorGuide/ServerInstallations/centralizedLogging.html#logstash-and-elk-configurations) together with message queue and `logstash`, follow the instructions above to configure `logstash` - If you do not want to use `centralized logging`, set the flag `/Operations/[vo]/EnableSecurityLogging = True` - [ ] JobParameters need to be stored in OpenSearch. See last bullet of [this documentation](https://dirac.readthedocs.io/en/latest/AdministratorGuide/Systems/WorkloadManagement/architecture.html#databases) - [ ] If you have not done it before, you'll need to [install and use RSS](https://dirac.readthedocs.io/en/latest/AdministratorGuide/Systems/ResourceStatus/install.html) - No need to go through the "Advanced Configuration" - to make data management operations working, the different StorageElements must be set to `Status` `Active` in the RSS - [ ] Register a DiracX `client` in the IdP - this will be needed in order for DiracX to authenticate - multi-VO: one client per VO (??) - [ ] if you have a DIRAC extension, update it considering the many changes. Few special notes to consider: - `Setup` disappeared from many places. - [ ] if you have a DIRAC extension you *might* need to code an empty `vodiracx` extension, depending on what your extension does - for examples, see: - First, check the [gubbins extension](https://github.com/DIRACGrid/diracx/tree/main/extensions) - [lhcbdiracx](https://gitlab.cern.ch/lhcb-dirac/lhcbdiracx) as "real" example - [ ] if you have a WebAppDIRAC extension, code an empty `vodiracx-web` extension - for examples, see: - First, check the [gubbins extension](https://github.com/DIRACGrid/diracx-web/tree/main/packages/extensions) - [lhcbdiracx-web](https://gitlab.cern.ch/lhcb-dirac/lhcbdiracx-web) as "real" example - [ ] have a S3-compatible storage for storing job sandboxes (MinIO is Dirac's suggested tool) - [ ] have a k8 project ready for hosting (vo)diracx - [ ] deploy (vo)diracx - instructions in https://github.com/DIRACGrid/diracx/issues/331 , and in https://github.com/DIRACGrid/diracx-charts/pull/126/files to be included "properly" here - In case you are updating the MySQL character set to utfmb4 If you are using the standard DFC: ```sql ALTER TABLE FC_DirectoryLevelTree DROP INDEX DirName, ADD INDEX DirName (DirName(767)) ``` ## (almost) any time before the update - [ ] add in DIRAC CS the DiracX section. CsSync subsection must have subsections for all the VOs in the DIRAC service: ``` URL = DisabledVO = LegacyClientEnabled { WorkloadManagement { JobStateUpdate = True } } CsSync { VOs { dteam { DefaultGroup = IdP { ClientID = URL } UserSubjects { } } one_more_VO { ... } } } LegacyExchangeApiKey = diracx:legacy:abcd123 ``` - [ ] Generate a legacy exchange api key: ```python import secrets import base64 import hashlib token = secrets.token_bytes() # This is the secret to include in the request by setting the # /DiracX/LegacyExchangeApiKey CS option in your legacy DIRAC installation (in the local -- secluded -- dirac.cfg file) print(f"API key is diracx:legacy:{base64.urlsafe_b64encode(token).decode()}") # This is the environment variable to set on the DiracX server print(f"DIRACX_LEGACY_EXCHANGE_HASHED_API_KEY={hashlib.sha256(token).hexdigest()}") ``` - set the `Diracx/LegacyExchangeApiKey` option - set the env variable `DIRACX_LEGACY_EXCHANGE_HASHED_API_KEY` under `diracx.settings` in the charts. - [ ] Add `WorkloadManagement > Services > SandboxStore > UseDiracXBackend = True` to use the S3 sandboxe store - [ ] Check ProxyDB for 1024 bit proxies - [ ] if your `.cfg` files (e.g. `dirac.cfg`) are managed by puppet (or something else) prepare an update for removing the `Setup` and `instanceName` - [ ] [replace ARC and ARC6 with AREX](https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-9.0#resources) - [ ] Create the MySQL DB `DiracXAuthDB`: ```sql DROP DATABASE IF EXISTS DiracXAuthDB; CREATE DATABASE DiracXAuthDB; GRANT SELECT,INSERT,LOCK TABLES,UPDATE,DELETE,CREATE,DROP,ALTER,REFERENCES,CREATE VIEW,SHOW VIEW,INDEX,TRIGGER,ALTER ROUTINE,CREATE ROUTINE ON DiracXAuthDB.* TO Dirac@'%' IDENTIFIED BY 'must_be_set'; ``` ## the day before the update - [ ] Install the latest DIRAC v8 - [ ] partial drain of the system (can't fully drain) by stopping Transformation/WorkflowTask agents ## few hours before the update - [ ] stop Transformation/RequestTask agents - [ ] stop Transformation/Transformation agents - [ ] stop RequestManagement/RequestExecuting agent ## Update phase ("deep downtime") - [ ] stop all DIRAC components (agents, services, executors) with the exception of: - Configuration Services - Framework/SystemAdministrator (of these, there will be one per server) - [ ] synchronize CS to DiracX: https://github.com/DIRACGrid/diracx/blob/main/docs/CONFIGURATION.md#modifying-configuration - [ ] update DBs with the following: ```sql GRANT CREATE TEMPORARY TABLES ON *.* TO 'Dirac'@'%'; use JobDB; ALTER TABLE `Jobs` ADD COLUMN `VO` VARCHAR(64); DROP TABLE IF EXISTS `JobsHistorySummary`; CREATE TABLE `JobsHistorySummary` ( `ID` INT AUTO_INCREMENT PRIMARY KEY, `Status` VARCHAR(32), `Site` VARCHAR(100), `Owner` VARCHAR(32), `OwnerGroup` VARCHAR(128), `VO` VARCHAR(64), `JobGroup` VARCHAR(32), `JobType` VARCHAR(32), `ApplicationStatus` VARCHAR(255), `MinorStatus` VARCHAR(128), `JobCount` INT, `RescheduleSum` INT, UNIQUE KEY uq_summary ( `Status`, `Site`, `Owner`, `OwnerGroup`(32), `VO`, `JobGroup`, `JobType`, `ApplicationStatus`(128), `MinorStatus` ) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; use PilotAgentsDB; ALTER TABLE `PilotAgents` ADD COLUMN `VO` VARCHAR(64); CREATE TABLE `PilotsHistorySummary` ( `GridSite` VARCHAR(128), `ComputingElement` VARCHAR(128), `GridType` VARCHAR(128), `Status` VARCHAR(32), `VO` VARCHAR(64), `PilotCount` INT, PRIMARY KEY (`GridSite`,`ComputingElement`,`GridType`,`Status`, `VO`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; use TaskQueueDB; ALTER TABLE `tq_TaskQueues` ADD COLUMN `Owner` VARCHAR(255) NOT NULL; ALTER TABLE `tq_TaskQueues` ADD COLUMN VO VARCHAR(64); use SandboxMetadataDB; ALTER TABLE `sb_Owners` ADD COLUMN `VO` VARCHAR(64); use TransformationDB; ALTER TABLE `Transformations` ADD COLUMN `Author` VARCHAR(255) NOT NULL; ALTER TABLE `Transformations` MODIFY COLUMN `AuthorDN` VARCHAR(255) DEFAULT NULL; ``` - [ ] Save the following [script](https://gist.github.com/fstagni/5d1d52f7185fc0d2adfb197a5c921b2a) for adding "VO" info to few DBs, in a (whatever, e.g. in /opt/dirac) directory of a DIRAC server machine, then run it with ``` python script_name.py -o /DIRAC/Security/UseServerCertificate=yes ``` - [ ] update DBs with following SQL statements: https://gist.github.com/fstagni/d977b4f3ebe5432ee7bb2743145dc837 - [ ] update Accounting DB by following instructions of point 2 of https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-9.0#accounting - [ ] install service Monitoring/WebApp - [ ] remove agent Framework/CAUpdateAgent - [ ] remove agent WorkloadManagement/CloudDirector - [ ] remove service WorkloadManagement/VirtualMachineManager - [ ] convert the Systems part of CS to "NoSetup" by running https://gist.github.com/atsareg/080682ed97f329e65c2458e99eca89e5 (or do by hand if you know what you are doing) - [ ] and add CS option `/DIRAC/NoSetup = True` for backward compatibility - [ ] convert the local cfg file to "NoSetup" (`/opt/dirac/etc/dirac.cfg`) (use the puppet update previously configured, if needed) - [ ] convert the Operations part of CS to "NoSetup" by running https://github.com/DIRACGrid/DIRAC/pull/7218/files (or do by hand if you know what you are doing) - [ ] install DIRAC v9 (usual procedure) - [ ] the OpenSearch indexes used for jobs parameters changed name (e.g. from "lhcb-production_elasticjobparameters_index_1014.0m" to "job_parameters_lhcb_1014m" -- this name is configurable, what is given is the standard naming). Update the name of the old indexes accordingly. - [ ] the OpenSearch indexes used for WMSHistory have an added "VO" field. I ## Restart phase - [ ] restart the running DIRAC components - [ ] start all stopped DIRAC components, services before agents ## Checking phase - DIRAC: - DiracX: - `dirac diracx whoami` - DiracX-Web: - should be up - JobMonitoring app should be there ## Any time after - [ ] the OpenSearch index names lost the `Setup` name. This means that index patterns (in OpenSearch, Grafana, etc.) would need to be updated to something like `*wmshistory*` ## Later: - (optional) enable https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-9.0#enable-remote-pilot-logging