DIRAC v9 migration
===
:::danger
NOT UPDATED
:::
:::info
:mega: :mega: :mega: :mega: :mega:
The content of this note is moved to the DIRAC wiki :
https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-9.0
:::
:::danger
NOT UPDATED
:::
### skeleton document for migration
## PRE
### Things that have to be done in v8, before you even start considering a migration
- [ ] Install the latest DIRAC v8
- [ ] The SecurityLogging Service is not anymore used by default: follow one of these 2 options:
- If you use the [centralized logging](https://dirac.readthedocs.io/en/latest/AdministratorGuide/ServerInstallations/centralizedLogging.html#logstash-and-elk-configurations) together with message queue and `logstash`, follow the instructions above to configure `logstash`
- If you do not want to use `centralized logging`, set the flag `/Operations/[vo]/EnableSecurityLogging = True`
- [ ] JobParameters need to be stored in OpenSearch. See last bullet of [this documentation](https://dirac.readthedocs.io/en/latest/AdministratorGuide/Systems/WorkloadManagement/architecture.html#databases)
- [ ] If you have not done it before, you'll need to [install and use RSS](https://dirac.readthedocs.io/en/latest/AdministratorGuide/Systems/ResourceStatus/install.html)
- No need to go through the "Advanced Configuration"
- to make data management operations working, the different StorageElements must be set to `Status` `Active` in the RSS
- [ ] Register a DiracX `client` in the IdP
- this will be needed in order for DiracX to authenticate
- multi-VO: one client per VO (??)
- [ ] if you have a DIRAC extension, update it considering the many changes. Few special notes to consider:
- `Setup` disappeared from many places.
- [ ] if you have a DIRAC extension you *might* need to code an empty `vodiracx` extension, depending on what your extension does
- for examples, see:
- First, check the [gubbins extension](https://github.com/DIRACGrid/diracx/tree/main/extensions)
- [lhcbdiracx](https://gitlab.cern.ch/lhcb-dirac/lhcbdiracx) as "real" example
- [ ] if you have a WebAppDIRAC extension, code an empty `vodiracx-web` extension
- for examples, see:
- First, check the [gubbins extension](https://github.com/DIRACGrid/diracx-web/tree/main/packages/extensions)
- [lhcbdiracx-web](https://gitlab.cern.ch/lhcb-dirac/lhcbdiracx-web) as "real" example
- [ ] have a S3-compatible storage for storing job sandboxes (MinIO is Dirac's suggested tool)
- [ ] have a k8 project ready for hosting (vo)diracx
- [ ] deploy (vo)diracx
- instructions in https://github.com/DIRACGrid/diracx/issues/331 , and in https://github.com/DIRACGrid/diracx-charts/pull/126/files to be included "properly" here
- In case you are updating the MySQL character set to utfmb4
If you are using the standard DFC:
```sql
ALTER TABLE FC_DirectoryLevelTree
DROP INDEX DirName,
ADD INDEX DirName (DirName(767))
```
## (almost) any time before the update
- [ ] add in DIRAC CS the DiracX section. CsSync subsection must have subsections for all the VOs in the DIRAC service:
```
URL =
DisabledVO =
LegacyClientEnabled
{
WorkloadManagement
{
JobStateUpdate = True
}
}
CsSync
{
VOs
{
dteam
{
DefaultGroup =
IdP
{
ClientID =
URL
}
UserSubjects
{
}
}
one_more_VO
{
...
}
}
}
LegacyExchangeApiKey = diracx:legacy:abcd123
```
- [ ] Generate a legacy exchange api key:
```python
import secrets
import base64
import hashlib
token = secrets.token_bytes()
# This is the secret to include in the request by setting the
# /DiracX/LegacyExchangeApiKey CS option in your legacy DIRAC installation (in the local -- secluded -- dirac.cfg file)
print(f"API key is diracx:legacy:{base64.urlsafe_b64encode(token).decode()}")
# This is the environment variable to set on the DiracX server
print(f"DIRACX_LEGACY_EXCHANGE_HASHED_API_KEY={hashlib.sha256(token).hexdigest()}")
```
- set the `Diracx/LegacyExchangeApiKey` option
- set the env variable `DIRACX_LEGACY_EXCHANGE_HASHED_API_KEY` under `diracx.settings` in the charts.
- [ ] Add `WorkloadManagement > Services > SandboxStore > UseDiracXBackend = True` to use the S3 sandboxe store
- [ ] Check ProxyDB for 1024 bit proxies
- [ ] if your `.cfg` files (e.g. `dirac.cfg`) are managed by puppet (or something else) prepare an update for removing the `Setup` and `instanceName`
- [ ] [replace ARC and ARC6 with AREX](https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-9.0#resources)
- [ ] Create the MySQL DB `DiracXAuthDB`:
```sql
DROP DATABASE IF EXISTS DiracXAuthDB;
CREATE DATABASE DiracXAuthDB;
GRANT SELECT,INSERT,LOCK TABLES,UPDATE,DELETE,CREATE,DROP,ALTER,REFERENCES,CREATE VIEW,SHOW VIEW,INDEX,TRIGGER,ALTER ROUTINE,CREATE ROUTINE ON DiracXAuthDB.* TO Dirac@'%' IDENTIFIED BY 'must_be_set';
```
## the day before the update
- [ ] Install the latest DIRAC v8
- [ ] partial drain of the system (can't fully drain) by stopping Transformation/WorkflowTask agents
## few hours before the update
- [ ] stop Transformation/RequestTask agents
- [ ] stop Transformation/Transformation agents
- [ ] stop RequestManagement/RequestExecuting agent
## Update phase ("deep downtime")
- [ ] stop all DIRAC components (agents, services, executors) with the exception of:
- Configuration Services
- Framework/SystemAdministrator (of these, there will be one per server)
- [ ] synchronize CS to DiracX: https://github.com/DIRACGrid/diracx/blob/main/docs/CONFIGURATION.md#modifying-configuration
- [ ] update DBs with the following:
```sql
GRANT CREATE TEMPORARY TABLES ON *.* TO 'Dirac'@'%';
use JobDB;
ALTER TABLE `Jobs` ADD COLUMN `VO` VARCHAR(64);
DROP TABLE IF EXISTS `JobsHistorySummary`;
CREATE TABLE `JobsHistorySummary` (
`ID` INT AUTO_INCREMENT PRIMARY KEY,
`Status` VARCHAR(32),
`Site` VARCHAR(100),
`Owner` VARCHAR(32),
`OwnerGroup` VARCHAR(128),
`VO` VARCHAR(64),
`JobGroup` VARCHAR(32),
`JobType` VARCHAR(32),
`ApplicationStatus` VARCHAR(255),
`MinorStatus` VARCHAR(128),
`JobCount` INT,
`RescheduleSum` INT,
UNIQUE KEY uq_summary (
`Status`,
`Site`,
`Owner`,
`OwnerGroup`(32),
`VO`,
`JobGroup`,
`JobType`,
`ApplicationStatus`(128),
`MinorStatus`
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
use PilotAgentsDB;
ALTER TABLE `PilotAgents` ADD COLUMN `VO` VARCHAR(64);
CREATE TABLE `PilotsHistorySummary` (
`GridSite` VARCHAR(128),
`ComputingElement` VARCHAR(128),
`GridType` VARCHAR(128),
`Status` VARCHAR(32),
`VO` VARCHAR(64),
`PilotCount` INT,
PRIMARY KEY (`GridSite`,`ComputingElement`,`GridType`,`Status`, `VO`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
use TaskQueueDB;
ALTER TABLE `tq_TaskQueues` ADD COLUMN `Owner` VARCHAR(255) NOT NULL;
ALTER TABLE `tq_TaskQueues` ADD COLUMN VO VARCHAR(64);
use SandboxMetadataDB;
ALTER TABLE `sb_Owners` ADD COLUMN `VO` VARCHAR(64);
use TransformationDB;
ALTER TABLE `Transformations` ADD COLUMN `Author` VARCHAR(255) NOT NULL;
ALTER TABLE `Transformations` MODIFY COLUMN `AuthorDN` VARCHAR(255) DEFAULT NULL;
```
- [ ] Save the following [script](https://gist.github.com/fstagni/5d1d52f7185fc0d2adfb197a5c921b2a) for adding "VO" info to few DBs, in a (whatever, e.g. in /opt/dirac) directory of a DIRAC server machine, then run it with
```
python script_name.py -o /DIRAC/Security/UseServerCertificate=yes
```
- [ ] update DBs with following SQL statements: https://gist.github.com/fstagni/d977b4f3ebe5432ee7bb2743145dc837
- [ ] update Accounting DB by following instructions of point 2 of https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-9.0#accounting
- [ ] install service Monitoring/WebApp
- [ ] remove agent Framework/CAUpdateAgent
- [ ] remove agent WorkloadManagement/CloudDirector
- [ ] remove service WorkloadManagement/VirtualMachineManager
- [ ] convert the Systems part of CS to "NoSetup" by running https://gist.github.com/atsareg/080682ed97f329e65c2458e99eca89e5 (or do by hand if you know what you are doing)
- [ ] and add CS option `/DIRAC/NoSetup = True` for backward compatibility
- [ ] convert the local cfg file to "NoSetup" (`/opt/dirac/etc/dirac.cfg`) (use the puppet update previously configured, if needed)
- [ ] convert the Operations part of CS to "NoSetup" by running https://github.com/DIRACGrid/DIRAC/pull/7218/files (or do by hand if you know what you are doing)
- [ ] install DIRAC v9 (usual procedure)
- [ ] the OpenSearch indexes used for jobs parameters changed name (e.g. from "lhcb-production_elasticjobparameters_index_1014.0m" to "job_parameters_lhcb_1014m" -- this name is configurable, what is given is the standard naming). Update the name of the old indexes accordingly.
- [ ] the OpenSearch indexes used for WMSHistory have an added "VO" field. I
## Restart phase
- [ ] restart the running DIRAC components
- [ ] start all stopped DIRAC components, services before agents
## Checking phase
- DIRAC:
- DiracX:
- `dirac diracx whoami`
- DiracX-Web:
- should be up
- JobMonitoring app should be there
## Any time after
- [ ] the OpenSearch index names lost the `Setup` name. This means that index patterns (in OpenSearch, Grafana, etc.) would need to be updated to something like `*wmshistory*`
## Later:
- (optional) enable https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-9.0#enable-remote-pilot-logging