# <img src="https://codimd.web.cern.ch/uploads/upload_45a14e417e9a8ade007f06e7b9420356.png" style="border: none;background: none;box-shadow:none"> initial deployments
[Julien Leduc](mailto:julien.leduc@cern.ch)
---
## Data archiving at CERN
<ul>
<li class="fragment">Ad aeternum storage</li>
<li class="fragment">7 tape libraries, 83 tape drives, 20k tapes</li>
<li class="fragment">Current use: <b style="color:dodgerblue;">330 PB</b></li>
<li class="fragment">Current capacity: <b style="color:coral;">0.7 EB</b></li>
<li class="fragment"><b style="color:red;">Exponentially growing</b></li>
</ul>
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_95716d3602c009e301c880b0afd4225a.png" data-background-size="80%" -->
---
<h2>Data Archiving at CERN <span class="fragment"><i style="color:blue;">Evolution</i></span></h2>
<ul>
<li class="fragment">EOS + tapes...</li>
<ul>
<li class="fragment">EOS is CERN strategic storage platform</li>
<li class="fragment">tape is the strategic long term archive medium</li>
</ul>
<li class="fragment">EOS + tapes = <span class="fragment" style="color:red;">♥</span></li>
<ul>
<li class="fragment">Meet CTA: CERN Tape Archive</li>
<li class="fragment">Streamline data paths, software and infrastructure</li>
</ul>
</ul>
---
<h2>EOS+CTA <span class="fragment"><i style="color:blue;">Deployment</i></span></h2>
----
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_d361eb4b4ad42029bd3d998a1600cfa0.png" data-background-size="70%" -->
----
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_d2d164112f95cfd9fa22d4532281323e.png" data-background-size="70%" -->
----
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_7d8fb723c75a802eb77a6e53037afe26.png" data-background-size="70%" -->
---
<h2>EOS+CTA <span class="fragment"><i style="color:blue;">Architecture</i></span></h2>
----
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_eac32c76dde5a45191434a90d54a4d5a.png" data-background-size="70%" -->
---
<h2>EOS+CTA <span class="fragment"><i style="color:blue;">Timeline</i></span></h2>
----
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_0ae96233cb49710754263e2d780a20b6.svg" data-background-size="100%" -->
---
<h2>EOS+CTA <span class="fragment"><i style="color:blue;">Dev&oper</i></span></h2>
<p class="fragment">
Tightly coupled software <span class="fragment">⇒ <span style="color:red;">tightly coupled developments</span></span>
</p>
<p class="fragment">
<span class="fragment highlight-blue">Extensive and systematic testing is paramount to limit regressions<span>
</p>
<p class="fragment">
<span class="fragment highlight-blue">Extensive monitoring</span> in place to <span class="fragment highlight-blue">ease debugging</span> and <span class="fragment highlight-red">target high performance from day 1</span><span>
----
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_0e38a1afc20ff3b7ce635b01826a4b84.png" data-background-size="70%" -->
----
## <span style="color: dodgerblue">For more information</span>
Come to my CERN IT Technical Forum presentation on 08/03/2019:
[System testing service developments using Docker and Kubernetes: EOS + CTA use case](https://indico.cern.ch/e/CERN-ITTF-2019-03-08)
---
# <span style="color: dodgerblue">CTA</span> VS <span style="color: crimson">experiment data transfers</span>
----
## ATLAS stage in
Several tests conducted with Atlas DDM team using Rucio and FTS.
- 2 stage in tests of 200TB each
- ~90k files of 2.6GB archived to tape
- sub-optimal EOS instance (2 slow disk servers)
----
## ATLAS stage in
<img src="https://codimd.web.cern.ch/uploads/upload_dfa6cf2e22f47bff0ff9f705a6fbe419.png" class="plain"></span>
<img src="https://codimd.web.cern.ch/uploads/upload_8d18a04f89dfd4626a3c073a48f6717e.png" class="plain"></span>
----
## ATLAS stage out
aka *Tape carousel* test took place in October 2018:
- 3 x EOS disk servers (~3x260TB of raw JBOD space)
- 6-10 x T10KD tape drives
- 90k files retrieved from EOSCTAATLASPPS (tape) to EOSATLAS by rucio through FTS
----
## ATLAS stage out
<img src="https://codimd.web.cern.ch/uploads/upload_cdff0f357f4522aabad54db96a12de84.png" class="plain"></span>
----
## ATLAS stage out
<img src="https://codimd.web.cern.ch/uploads/upload_f08082d31f8d0839404ca282d05d7fa7.png" class="plain"></span>
----
## ATLAS stage out DDM
<img src="https://codimd.web.cern.ch/uploads/upload_5a6394a3c1efa419f01d3c548edbb60e.png" class="plain"></span>
<span class="fragment"><b style="color:crimson;">500MB/s of sustained performance per 288TB of disk...</b></span>
---
## Run3 T0 archive architecture
4 LHC experiments will write at <span class="fragment"><b style="color:dodgerblue;">60GB/s to the archival system.</b></span>
<span class="fragment">Scaling the current `eosctaatlaspps` would require approximately</span> <span class="fragment">$288TB \times 2 \times 60=34.5PB$ of disk storage.</span>
<span class="fragment"><b style="color:crimson;">This means 70PB of 2-replicas disk storage!</b></span>
<span class="fragment">Going to next gen disk servers: 1PB of raw disk 4GB/s bandwidth is </span><span class="fragment"><b style="color:crimson;">30PB of disk storage.</b></span>
----
## Run3 T0 archive architecture <span style="color: crimson">*evolution*</span>
Small faster cache close to the tapes that aims to contain $x$ hours of data traffic.
Aggressively removing files from buffer to free up space.
<span class="fragment">From Rucio point of view CERN EOSCTA endpoint is <b style="color:crimson;">tape only</b></span>.
----
## ✅ to EOSCTA = ✅ on Tape
Why is it so important for an archival endpoint?
<span class="fragment">- data integrity checked during write (Logical Block Protection)</span>
<span class="fragment">- long term stable medium</span>
<span class="fragment"><b style="color:crimson;">Data preservation on tape is a difficult enough topic.</b></span>
----
## Archival
```mermaid
sequenceDiagram
participant Experiment
participant FTS
participant EOS
participant EOSCTA
participant Tape
Experiment->>FTS: archive(file)
activate EOS
FTS->>EOSCTA: xrdcp EOS:file
EOS->>+EOSCTA: file
loop until timeout
FTS->>EOSCTA: file backup_bit ?
alt backup_bit=1
activate Tape
EOSCTA->>FTS: file on tape
FTS->>Experiment: file archival OK
EOSCTA->>-EOSCTA: delete file
deactivate Tape
else backup_bit=0
activate EOSCTA
EOSCTA-xFTS: file NOT on tape
FTS->>-EOSCTA: delete file
FTS-xExperiment: file archival FAILED
end
end
deactivate EOS
```
----
## Retrieval
```mermaid
sequenceDiagram
participant Experiment
participant FTS
participant EOS
participant EOSCTA
participant Tape
Experiment->>FTS: retrieve(file)
activate Tape
FTS->>EOSCTA: xrdfs prepare file
loop until timeout
FTS->>EOSCTA: file online ?
alt online_bit=1
Tape->>+EOSCTA: file
activate EOSCTA
EOSCTA->>FTS: file is online
FTS->>EOS: xrdcp EOSCTA:file
EOSCTA->>+EOS: file
FTS->>Experiment: file retrieval OK
EOSCTA->>-EOSCTA: delete file
deactivate EOS
else online_bit=0
EOSCTA-xFTS: file is NOT online
FTS-xExperiment: file retrieval FAILED
end
end
deactivate Tape
```
---
# <span style="color: dodgerblue">CTA</span> & <span style="color: crimson">Rucio</span>
## <span style="color:crimson">ATLAS & CMS</span>
- Working with respective Rucio teams
- PPS instances are <span style="color:blue">up and running</span>
- <span style="color:crimson">will be upgraded next week</span>
- More capacity will be moved to CTA
{"title":"2nd RUCIO Community Workshop CTA initial deployments","description":"2019 RUCIO Community Workshop presentation of CTA initial deployment","slideOptions":{"transition":"slide","theme":"white"}}