# Production deployment of Atlas CERN Tape Archive instance</span> 14 October 2020 [Julien Leduc](mailto:julien.leduc@cern.ch) for the CTA team --- <h2>EOS+CTA <span><i style="color:blue;">in Production</i></span></h2> eosctaatlas is in production since 29 June 2020 --- ## Data archiving at CERN <ul> <li class="fragment">Run2: 7 tape libraries, 83 tape drives, 30k tapes</li> <li class="fragment">Run3: 4-5 tape libraries, 160+ tape drives, 150PB+/year, >40GB/s</li> </ul> <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_95716d3602c009e301c880b0afd4225a.png" data-background-size="80%" --> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Deployment</i></span></h2> ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_d11477e440602657feb6144ca74b97b8.svg" data-background-size="70%" --> ---- <!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_38e9df2ac5b0bab55677c1ba22b045cd.svg" data-background-size="70%" --> --- <h2>EOS+CTA <span class="fragment"><i style="color:blue;">Timeline</i></span></h2> ---- <!-- .slide: data-background="https://hackmd.web.cern.ch/uploads/upload_0ae96233cb49710754263e2d780a20b6.svg" data-background-size="100%" --> --- <h2>EOS+CTA <span><i style="color:blue;">Architecture</i></span></h2> ---- <h2>EOS+CTA <i style="color:blue;">Architecture</i></h2> Main difference with CASTOR: <span class="fragment" style="color: dodgerblue"><b>EOSCTA is a pure tape system.</b></span> <span class="fragment">Disk cache duty consolidated in main <b style="color: dodgerblue">EOS instance.</b></span> <span class="fragment">Operating tape drive at full speed full time efficiently requires a <b style="color: crimson">SSD based buffer.</b></span> ---- <h2>EOS+CTA <i style="color:blue;">Architecture</i></h2> <img src="https://codimd.web.cern.ch/uploads/upload_e764d94a4ee3ac79c328ea0d21a6a128.svg" class="plain"></span> --- <h2>EOSCTA <span class="fragment"><i style="color:blue;">Hardware</i></span></h2> ---- <h2>EOSCTA <span><i style="color:blue;">Hardware</i></span></h2> buffer servers: - 16x2TB SSDs, 25Gb/s each - 200GB of RAM, 500GB NVMe (OS + logs) - hosting up to **1 EOSCTA instance per server** Specific *bandwidth oriented* EOS setup each server runs: - **1 EOS MGM** - **1 EOS NAMESPACE** - *quarkdb* - **15 EOS DISKSERVERs** - *FSTs* ---- ```graphviz graph hierarchy { nodesep=1 // increases the separation between nodes node [color=Red, fontname=Courier, shape=box] //All nodes will this shape and colour edge [color=Blue, label="25Gb/s"] //All the lines look like this Router [shape=circle] Router--{SwitchBuffer} [label="3x(2x100Gb/s)", fontsize=15, style=bold] Router--{SwitchTape} [label="7x20Gb/s", fontsize=15, style=bold] subgraph cluster_level1{ label="EOSCTA Buffer infrastructure\n3x10 hyperconverged servers" color=dodgerblue fontcolor=dodgerblue SwitchBuffer SSD01 [color=black, shape=cylinder] SSDXX [color=black, shape=cylinder] SSD16 [color=black, shape=cylinder] buffersrv01 buffersrvXX--{SSD01 SSDXX SSD16} [label=""] } subgraph cluster_level2{ label="Tape infrastructure\nXX tapeservers" color=crimson fontcolor=crimson SwitchTape SwitchTape--{tpsrv01 tpsrvXX} [color=Blue, label="10Gb/s"] SwitchBuffer--{buffersrv01 buffersrvXX } [color=Blue, label="25Gb/s", style=bold] {rank=same; tpsrv01 tpsrvXX} // Put them on the same level tape [color=black, shape=Msquare] tpsrvXX--tape [label="360MB/s"] } } ``` --- <h2>EOSCTA <span class="fragment"><i style="color:blue;">to Production</i></span></h2> ---- <h2>EOS+CTA <span><i style="color:blue;">to Production</i></span></h2> ###### Tests on PPS instance: ✅ write 200TB to PPS instance ✅ read back 200TB from PPS instance ###### Tests on RO instance: ✅ import CASTOR experiment namespace in eoscta RO instance ✅ read CASTOR data back from eoscta RO instance ###### More tests? ---- <h2>EOS+CTA <span><i style="color:blue;">to Production</i></span></h2> ```mermaid gantt title CTA timeline closeup dateFormat YYYY-MM-DD section ATLAS eosctaatlaspps :2020-02-01, 2020-06-29 eosctaatlasro :2020-02-15, 2020-06-12 migration to CTA :2020-06-12, 2020-06-29 Production :crit, 2020-06-29, 2020-12-01 section ALICE eosctaalicepps :2020-02-01,2020-09-30 eosctaalicero : 2020-05-01, 2020-09-30 Migration to CTA : 2020-09-30, 2020-10-12 Production :crit, 2020-10-12, 2020-12-01 section CMS eosctacmspps :2020-02-01,2020-11-12 eosctacmsro :2020-10-12, 2020-11-12 Production :crit,2020-11-12, 2020-12-01 section LHCB eosctalhcbpps :2020-09-01, 2020-12-01 ``` ---- ## CTA and Atlas Caroussel - **320k files/820TB** - 20 enterprise IBM tape drives - 7 LTO tape drives - 3 buffer servers used - 90TB of SSD buffer - up to 8.6 GB/s of cumulated buffer BW - <font color="crimson">BW capped by tape drive speed</font> ---- ## Rucio rules submissions ![](https://codimd.web.cern.ch/uploads/upload_441f0b4dbb54c3a0a397e4ddd6c95e7c.png) ---- ## Actual transfer speed (FTS) ![](https://codimd.web.cern.ch/uploads/upload_f8ce256fab959f40d87eb65b9823e272.png) ---- ## ATLAS caroussel ![](https://codimd.web.cern.ch/uploads/upload_397db272dd09fec88c4ce9c69d4fa42e.png) ---- ## ATLAS caroussel and Rucio WS report (cont) - <font color="dodgerblue">low error rate for CTA: **only ~350 errors**</font> - 108 due to tape drive failure - 140 due to a scheduled intervention on eosatlas - 100 due to slow streams (marginal rate of slow writes) **Standard production incidents were handled smoothly by CTA software and service operations** ---- ## More CTA torture tests... Last week: archive 1PB of data to tape ![](https://codimd.web.cern.ch/uploads/upload_01c26bd91ec51128848b29571f081b8b.png) --- <h2>EOSCTA <span class="fragment"><i style="color:blue;">in Production</i></span></h2> ---- <h2>EOS+CTA <span><i style="color:blue;">in Production</i></span></h2> eosctaatlas is in production since 29 June 2020 ![](https://codimd.web.cern.ch/uploads/upload_234206e35d3818d0f837b6423a74d0f3.png) ---- ![](https://codimd.web.cern.ch/uploads/upload_6dbee536e2e7c69ea0e6c52927386504.png) ---- ![](https://codimd.web.cern.ch/uploads/upload_6be19454e92ee3f939d585b4620016af.png) ---- ![](https://codimd.web.cern.ch/uploads/upload_32a86bf2c4fefff887e81638ffd5352e.png) ---
{"title":"201014 Hepix Autumn 2020 Production deployment of the CERN Tape Archive (CTA) for Atlas","description":"Production deployment of the CERN Tape Archive (CTA) for Atlas","slideOptions":{"transition":"slide","theme":"white"}}