# CTA status and plans</span> 27 September 2019 [Julien Leduc](mailto:julien.leduc@cern.ch) for the CTA team --- ## Data archiving at CERN <ul> <li class="fragment">Ad aeternum storage</li> <li class="fragment">Current use: <b style="color:dodgerblue;">340 PB</b></li> <li class="fragment"><b style="color:red;">Exponentially growing</b></li> <li class="fragment">Run2: 7 tape libraries, 83 tape drives, 30k tapes</li> <li class="fragment">Run3: 4-5 tape libraries, 160+ tape drives, 150PB+/year, >40GB/s</li> </ul> <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_95716d3602c009e301c880b0afd4225a.png" data-background-size="80%" --> --- <h2>Data Archiving at CERN <span class="fragment"><i style="color:blue;">Evolution</i></span></h2> <ul> <li class="fragment">EOS + tapes...</li> <ul> <li class="fragment">EOS is CERN strategic storage platform</li> <li class="fragment">tape is the strategic long term archive medium</li> </ul> <li class="fragment">EOS + tapes = <span class="fragment" style="color:red;">&hearts;</span></li> <ul> <li class="fragment">Meet CTA: CERN Tape Archive</li> <li class="fragment">Streamline data paths, consolidate software development and operations</li> </ul> </ul> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Deployment</i></span></h2> ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_d11477e440602657feb6144ca74b97b8.svg" data-background-size="70%" --> ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_38e9df2ac5b0bab55677c1ba22b045cd.svg" data-background-size="70%" --> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Timeline</i></span></h2> ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_0ae96233cb49710754263e2d780a20b6.svg" data-background-size="100%" --> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Architecture</i></span></h2> <ul> <li class="fragment">CTA offers the <i>Best of Both Worlds</i></li> <ul> <li class="fragment">User interface and file access <span style="color:dodgerblue">from EOS</span></li> <li class="fragment">Tape system management <span style="color:crimson">from CASTOR</span></li> <li class="fragment">New scalable, robust queuing system to link the two</li> </ul> <li class="fragment">CTA design principles</i></li> <ul> <li class="fragment">Simplicity</li> <li class="fragment">Scalabilty</li> <li class="fragment">Performance</li> </ul> </ul> ---- <h2>EOS+CTA <i style="color:blue;">Architecture</i></h2> Main difference with CASTOR: <span class="fragment" style="color: dodgerblue"><b>EOSCTA is a pure tape system.</b></span> <span class="fragment">Disk cache duty consolidated in main <b style="color: dodgerblue">EOS instance.</b></span> <span class="fragment">Operating tape drive at full speed full time efficiently requires a <b style="color: crimson">SSD based buffer.</b></span> ---- <h2>EOS+CTA <i style="color:blue;">Architecture</i></h2> <img src="https://codimd.web.cern.ch/uploads/upload_e764d94a4ee3ac79c328ea0d21a6a128.svg" class="plain"></span> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Typical operations</i></span></h2> <ul> <li class="fragment">Write file to eoscta buffer</li> <li class="fragment">Is file on tape?</li> <li class="fragment">Queue file for retrieve</li> <li class="fragment">Is file in eoscta buffer?</li> <li class="fragment">Read file from eoscta buffer</li> <li class="fragment">Evict file from eoscta buffer</li> <li class="fragment">Delete file from namespace</li> </ul> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Dev&oper</i></span></h2> <p class="fragment"> Tightly coupled software <span class="fragment">&rArr; <span style="color:red;">tightly coupled developments</span></span> </p> <p class="fragment"> <span class="fragment highlight-blue">Extensive and systematic testing is paramount to limit regressions<span> </p> <p class="fragment"> <span class="fragment highlight-blue">Extensive monitoring</span> in place to <span class="fragment highlight-blue">ease debugging</span> and <span class="fragment highlight-red">target high performance from day 1</span><span> ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_93d85ab12e6b09b311778d3d762d9185.png" data-background-size="70%" --> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Pre Production instances</i></span></h2> ---- ## <span style="color: dodgerblue">EOSCTA</span>PPS 5 hyperconverged servers: - 16x1TB SSDs, 10Gb/s each - hosting **5 EOSCTA instances** Specific bandwidth oriented EOS setup each server runs: - **1 EOS MGM** - **1 EOS NAMESPACE** - *quarkdb* - **14 EOS DISKSERVERs** - *FSTs* ---- ## <span style="color: dodgerblue">EOSCTA</span>PPS <img src="https://codimd.web.cern.ch/uploads/upload_88f3b03ec6cf37aa8c59787b8909d6f6.svg" class="plain"> ---- ## <span style="color: dodgerblue">EOSCTA</span>PPS Deployed instances: <ul> <li class="fragment"><b>eosctaatlaspps</b> redundant share of CASTOR writes rule in place</li> <li class="fragment"><b>eosctacmspps</b> tape endpoint for CMS Rucio instance</li> <li class="fragment"><b>eosctaalicepps</b> for Alice</li> <li class="fragment"><b>eosctapps</b> CASTOR migration instance</li> <li class="fragment"><b>eosctarepack</b> for CTA repack activities</li> </ul> --- ## <span style="color: dodgerblue">CTA</span> and <span style="color: crimson">ATLAS DATA carousel</span> Use tape as input for I/O intensive workflows. <img src="https://codimd.web.cern.ch/uploads/upload_b121d6147230892c509c45a2af072320.png" class="plain"> ---- ## <span style="color: dodgerblue">CTA</span> and <span style="color: crimson">ATLAS DATA carousel</span> Close collaboration between: - ATLAS DDM team - RUCIO developers - FTS, CTA, xrootd, EOS developers Has been key to ATLAS workflow integration work. <span class="fragment" style="color:dodgerblue"><b>MANY THANKS!</b></span> ---- ## Atlas Archival (april 2019) <img src="https://codimd.web.cern.ch/uploads/upload_9f8be3fca81b9dcdca64e7e87c5befed.png" class="plain"> ---- ## Atlas Recall (june 2019) <img src="https://codimd.web.cern.ch/uploads/upload_b4dd1881c1c1be7126489c41b34edce9.png" class="plain"> ---- ## Atlas Recall (june 2019) inefficiencies <img src="https://codimd.web.cern.ch/uploads/upload_54d5eb635a3fbfca6294fcabec0545ab.png" class="plain"> --- ## CASTOR -> CTA migration ATLAS needed `/castor/cern.ch/grid/atlas/rucio` for the next recall exercise. Migration principles: - metadata only operation - CASTOR data is **RO in CTA** - migration **by tape pool** 90M files were migrated from CASTOR to CTA --- ## ATLAS 2018 recall campaign CTA migration instance `eosctapps`: - 5 hyperconverged servers - 20TB of SSDs - 10 tape drives ---- ## CTA cumulated recall volume <img src="https://codimd.web.cern.ch/uploads/upload_56d9d3f7ba73f6b59dc8b4616e1a2c4f.png" class="plain"> ---- ## CTA share in daily total recall volume <img src="https://codimd.web.cern.ch/uploads/upload_a3e2e74e123fd54914513bba10fe5c22.png" class="plain"> ---- ## CTA daily recall volume <img src="https://codimd.web.cern.ch/uploads/upload_c24024734407c1aada74d3b1be28ae93.png" class="plain"> ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_aa47013dfc372c182dd605d1f825f170.png" data-background-size="100%" --> ---- > From ATLAS side, the recall campaign went smoothly and we managed to recall all the files ( 326k files, 741 TB) in a timely manner (CERN was the first site to achieve the full recall of all the files). This gave us additional confidence in CTA performance and in the migration strategy > > ATLAS DDM --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Production instances</i></span></h2> Getting ready for the final migration: - final buffer hardware on its way: - racks are cabled - network switches allocated (600Gb/s of BW) - 30 hyper converged machines mid October - Production ready beginning of November ---- ```graphviz graph hierarchy { nodesep=1 // increases the separation between nodes node [color=Red, fontname=Courier, shape=box] //All nodes will this shape and colour edge [color=Blue, label="25Gb/s"] //All the lines look like this Router [shape=circle] Router--{SwitchBuffer} [label="3x(2x100Gb/s)", fontsize=15, style=bold] Router--{SwitchTape} [label="7x20Gb/s", fontsize=15, style=bold] subgraph cluster_level1{ label="EOSCTA Buffer infrastructure\n3x10 hyperconverged servers" color=dodgerblue fontcolor=dodgerblue SwitchBuffer SSD01 [color=black, shape=cylinder] SSDXX [color=black, shape=cylinder] SSD16 [color=black, shape=cylinder] buffersrv01 buffersrvXX--{SSD01 SSDXX SSD16} [label=""] } subgraph cluster_level2{ label="Tape infrastructure\nXX tapeservers" color=crimson fontcolor=crimson SwitchTape SwitchTape--{tpsrv01 tpsrvXX} [color=Blue, label="10Gb/s"] SwitchBuffer--{buffersrv01 buffersrvXX } [color=Blue, label="25Gb/s", style=bold] {rank=same; tpsrv01 tpsrvXX} // Put them on the same level tape [color=black, shape=Msquare] tpsrvXX--tape [label="360MB/s"] } } ``` --- <h2>Status summary</h2> <ul> <li class="fragment">Core developments finished</li> <li class="fragment">Workflow integration in FTS and Rucio (through xrootd API)</li> <li class="fragment">Core operational environment ready</li> <li class="fragment">Extensive internal testing and external validation</li> <li class="fragment">Outside institutes expressed interest and collaborated</li> </ul> <b class="fragment" style="color:dodgerblue;">WE ARE READY!</b> --- ## Extra slides --- # 2018 recall exercise performance monitoring ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_aa47013dfc372c182dd605d1f825f170.png" data-background-size="100%" --> --- # Workflows for Archival and Retrieval ---- ## Archival ```mermaid sequenceDiagram participant Experiment participant FTS participant EOS participant EOSCTA participant Tape Experiment->>FTS: archive(file) activate EOS FTS->>EOSCTA: xrdcp EOS:file EOS->>+EOSCTA: file loop until timeout FTS->>EOSCTA: file backup_bit ? alt backup_bit=1 EOSCTA-->>+Tape: file deactivate EOSCTA Tape->>FTS: file on tape FTS->>Experiment: file archival OK deactivate Tape else backup_bit=0 activate EOSCTA EOSCTA-xFTS: file NOT on tape FTS->>-EOSCTA: delete file FTS-xExperiment: file archival FAILED end end deactivate EOS ``` ---- ## Retrieval ```mermaid sequenceDiagram participant Experiment participant FTS participant EOS participant EOSCTA participant Tape Experiment->>FTS: retrieve(file) activate Tape FTS->>EOSCTA: xrdfs prepare -s file loop until timeout FTS->>EOSCTA: file online ? alt online_bit=1 Tape->>+EOSCTA: file activate EOSCTA EOSCTA->>FTS: file is online FTS->>EOS: xrdcp EOSCTA:file EOSCTA->>+EOS: file FTS->>EOSCTA: xrdfs prepare -e deactivate EOSCTA FTS->>Experiment: file retrieval OK deactivate EOS else online_bit=0 EOSCTA-xFTS: file is NOT online FTS-xExperiment: file retrieval FAILED end end deactivate Tape ``` ---
{"title":"190927 ITTF/Computing Seminar","description":"CERN Tape Archive production status and plans","slideOptions":{"transition":"slide","theme":"white"}}