# <img src="https://codimd.web.cern.ch/uploads/upload_45a14e417e9a8ade007f06e7b9420356.png" style="border: none;background: none;box-shadow:none"> initial deployments [Julien Leduc](mailto:julien.leduc@cern.ch) --- ## Data archiving at CERN <ul> <li class="fragment">Ad aeternum storage</li> <li class="fragment">7 tape libraries, 83 tape drives, 20k tapes</li> <li class="fragment">Current use: <b style="color:dodgerblue;">330 PB</b></li> <li class="fragment">Current capacity: <b style="color:coral;">0.7 EB</b></li> <li class="fragment"><b style="color:red;">Exponentially growing</b></li> </ul> <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_95716d3602c009e301c880b0afd4225a.png" data-background-size="80%" --> --- <h2>Data Archiving at CERN <span class="fragment"><i style="color:blue;">Evolution</i></span></h2> <ul> <li class="fragment">EOS + tapes...</li> <ul> <li class="fragment">EOS is CERN strategic storage platform</li> <li class="fragment">tape is the strategic long term archive medium</li> </ul> <li class="fragment">EOS + tapes = <span class="fragment" style="color:red;">&hearts;</span></li> <ul> <li class="fragment">Meet CTA: CERN Tape Archive</li> <li class="fragment">Streamline data paths, software and infrastructure</li> </ul> </ul> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Deployment</i></span></h2> ---- <!-- .slide: data-background="https://hackmd.web.cern.ch/uploads/upload_d361eb4b4ad42029bd3d998a1600cfa0.png" data-background-size="70%" --> ---- <!-- .slide: data-background="https://hackmd.web.cern.ch/uploads/upload_d2d164112f95cfd9fa22d4532281323e.png" data-background-size="70%" --> ---- <!-- .slide: data-background="https://hackmd.web.cern.ch/uploads/upload_7d8fb723c75a802eb77a6e53037afe26.png" data-background-size="70%" --> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Architecture</i></span></h2> ---- <!-- .slide: data-background="https://hackmd.web.cern.ch/uploads/upload_eac32c76dde5a45191434a90d54a4d5a.png" data-background-size="70%" --> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Timeline</i></span></h2> ---- <!-- .slide: data-background="https://hackmd.web.cern.ch/uploads/upload_0ae96233cb49710754263e2d780a20b6.svg" data-background-size="100%" --> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Dev&oper</i></span></h2> <p class="fragment"> Tightly coupled software <span class="fragment">&rArr; <span style="color:red;">tightly coupled developments</span></span> </p> <p class="fragment"> <span class="fragment highlight-blue">Extensive and systematic testing is paramount to limit regressions<span> </p> <p class="fragment"> <span class="fragment highlight-blue">Extensive monitoring</span> in place to <span class="fragment highlight-blue">ease debugging</span> and <span class="fragment highlight-red">target high performance from day 1</span><span> ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_0e38a1afc20ff3b7ce635b01826a4b84.png" data-background-size="70%" --> ---- ## <span style="color: dodgerblue">For more information</span> Come to my CERN IT Technical Forum presentation on 08/03/2019: [System testing service developments using Docker and Kubernetes: EOS + CTA use case](https://indico.cern.ch/e/CERN-ITTF-2019-03-08) --- # <span style="color: dodgerblue">CTA</span> VS <span style="color: crimson">experiment data transfers</span> ---- ## ATLAS stage in Several tests conducted with Atlas DDM team using Rucio and FTS. - 2 stage in tests of 200TB each - ~90k files of 2.6GB archived to tape - sub-optimal EOS instance (2 slow disk servers) ---- ## ATLAS stage in <img src="https://hackmd.web.cern.ch/uploads/upload_dfa6cf2e22f47bff0ff9f705a6fbe419.png" class="plain"></span> <img src="https://hackmd.web.cern.ch/uploads/upload_8d18a04f89dfd4626a3c073a48f6717e.png" class="plain"></span> ---- ## ATLAS stage out aka *Tape carousel* test took place in October 2018: - 3 x EOS disk servers (~3x260TB of raw JBOD space) - 6-10 x T10KD tape drives - 90k files retrieved from EOSCTAATLASPPS (tape) to EOSATLAS by rucio through FTS ---- ## ATLAS stage out <img src="https://hackmd.web.cern.ch/uploads/upload_cdff0f357f4522aabad54db96a12de84.png" class="plain"></span> ---- ## ATLAS stage out <img src="https://hackmd.web.cern.ch/uploads/upload_f08082d31f8d0839404ca282d05d7fa7.png" class="plain"></span> ---- ## ATLAS stage out DDM <img src="https://hackmd.web.cern.ch/uploads/upload_5a6394a3c1efa419f01d3c548edbb60e.png" class="plain"></span> <span class="fragment"><b style="color:crimson;">500MB/s of sustained performance per 288TB of disk...</b></span> --- ## Run3 T0 archive architecture 4 LHC experiments will write at <span class="fragment"><b style="color:dodgerblue;">60GB/s to the archival system.</b></span> <span class="fragment">Scaling the current `eosctaatlaspps` would require approximately</span> <span class="fragment">$288TB \times 2 \times 60=34.5PB$ of disk storage.</span> <span class="fragment"><b style="color:crimson;">This means 70PB of 2-replicas disk storage!</b></span> <span class="fragment">Going to next gen disk servers: 1PB of raw disk 4GB/s bandwidth is </span><span class="fragment"><b style="color:crimson;">30PB of disk storage.</b></span> ---- ## Run3 T0 archive architecture <span style="color: crimson">*evolution*</span> Small faster cache close to the tapes that aims to contain $x$ hours of data traffic. Aggressively removing files from buffer to free up space. <span class="fragment">From Rucio point of view CERN EOSCTA endpoint is <b style="color:crimson;">tape only</b></span>. ---- ## &#x2705; to EOSCTA = &#x2705; on Tape Why is it so important for an archival endpoint? <span class="fragment">- data integrity checked during write (Logical Block Protection)</span> <span class="fragment">- long term stable medium</span> <span class="fragment"><b style="color:crimson;">Data preservation on tape is a difficult enough topic.</b></span> ---- ## Archival ```mermaid sequenceDiagram participant Experiment participant FTS participant EOS participant EOSCTA participant Tape Experiment->>FTS: archive(file) activate EOS FTS->>EOSCTA: xrdcp EOS:file EOS->>+EOSCTA: file loop until timeout FTS->>EOSCTA: file backup_bit ? alt backup_bit=1 activate Tape EOSCTA->>FTS: file on tape FTS->>Experiment: file archival OK EOSCTA->>-EOSCTA: delete file deactivate Tape else backup_bit=0 activate EOSCTA EOSCTA-xFTS: file NOT on tape FTS->>-EOSCTA: delete file FTS-xExperiment: file archival FAILED end end deactivate EOS ``` ---- ## Retrieval ```mermaid sequenceDiagram participant Experiment participant FTS participant EOS participant EOSCTA participant Tape Experiment->>FTS: retrieve(file) activate Tape FTS->>EOSCTA: xrdfs prepare file loop until timeout FTS->>EOSCTA: file online ? alt online_bit=1 Tape->>+EOSCTA: file activate EOSCTA EOSCTA->>FTS: file is online FTS->>EOS: xrdcp EOSCTA:file EOSCTA->>+EOS: file FTS->>Experiment: file retrieval OK EOSCTA->>-EOSCTA: delete file deactivate EOS else online_bit=0 EOSCTA-xFTS: file is NOT online FTS-xExperiment: file retrieval FAILED end end deactivate Tape ``` --- # <span style="color: dodgerblue">CTA</span> & <span style="color: crimson">Rucio</span> ## <span style="color:crimson">ATLAS & CMS</span> - Working with respective Rucio teams - PPS instances are <span style="color:blue">up and running</span> - <span style="color:crimson">will be upgraded next week</span> - More capacity will be moved to CTA
