# Production deployment of Atlas CERN Tape Archive instance</span>
14 October 2020
[Julien Leduc](mailto:julien.leduc@cern.ch) for the CTA team
<h2>EOS+CTA <span><i style="color:blue;">in Production</i></span></h2>
eosctaatlas is in production since 29 June 2020
## Data archiving at CERN
<li class="fragment">Run2: 7 tape libraries, 83 tape drives, 30k tapes</li>
<li class="fragment">Run3: 4-5 tape libraries, 160+ tape drives, 150PB+/year, >40GB/s</li>
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_95716d3602c009e301c880b0afd4225a.png" data-background-size="80%" -->
<h2>EOS+CTA <span class="fragment"><i style="color:blue;">Deployment</i></span></h2>
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_d11477e440602657feb6144ca74b97b8.svg" data-background-size="70%" -->
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_38e9df2ac5b0bab55677c1ba22b045cd.svg" data-background-size="70%" -->
<h2>EOS+CTA <span class="fragment"><i style="color:blue;">Timeline</i></span></h2>
<!-- .slide: data-background="https://codimd.web.cern.ch/uploads/upload_0ae96233cb49710754263e2d780a20b6.svg" data-background-size="100%" -->
<h2>EOS+CTA <span><i style="color:blue;">Architecture</i></span></h2>
<h2>EOS+CTA <i style="color:blue;">Architecture</i></h2>
Main difference with CASTOR: <span class="fragment" style="color: dodgerblue"><b>EOSCTA is a pure tape system.</b></span>
<span class="fragment">Disk cache duty consolidated in main <b style="color: dodgerblue">EOS instance.</b></span>
<span class="fragment">Operating tape drive at full speed full time efficiently requires a <b style="color: crimson">SSD based buffer.</b></span>
<h2>EOS+CTA <i style="color:blue;">Architecture</i></h2>
<img src="https://codimd.web.cern.ch/uploads/upload_e764d94a4ee3ac79c328ea0d21a6a128.svg" class="plain"></span>
<h2>EOSCTA <span class="fragment"><i style="color:blue;">Hardware</i></span></h2>
<h2>EOSCTA <span><i style="color:blue;">Hardware</i></span></h2>
buffer servers:
- 16x2TB SSDs, 25Gb/s each
- 200GB of RAM, 500GB NVMe (OS + logs)
- hosting up to **1 EOSCTA instance per server**
Specific *bandwidth oriented* EOS setup each server runs:
- **1 EOS MGM**
- **1 EOS NAMESPACE** - *quarkdb*
- **15 EOS DISKSERVERs** - *FSTs*
graph hierarchy {
nodesep=1 // increases the separation between nodes
node [color=Red, fontname=Courier, shape=box] //All nodes will this shape and colour
edge [color=Blue, label="25Gb/s"] //All the lines look like this
Router [shape=circle]
Router--{SwitchBuffer} [label="3x(2x100Gb/s)", fontsize=15, style=bold]
Router--{SwitchTape} [label="7x20Gb/s", fontsize=15, style=bold]
subgraph cluster_level1{
label="EOSCTA Buffer infrastructure\n3x10 hyperconverged servers"
SSD01 [color=black, shape=cylinder]
SSDXX [color=black, shape=cylinder]
SSD16 [color=black, shape=cylinder]
buffersrvXX--{SSD01 SSDXX SSD16} [label=""]
subgraph cluster_level2{
label="Tape infrastructure\nXX tapeservers"
SwitchTape--{tpsrv01 tpsrvXX} [color=Blue, label="10Gb/s"]
SwitchBuffer--{buffersrv01 buffersrvXX } [color=Blue, label="25Gb/s", style=bold]
{rank=same; tpsrv01 tpsrvXX} // Put them on the same level
tape [color=black, shape=Msquare]
tpsrvXX--tape [label="360MB/s"]
<h2>EOSCTA <span class="fragment"><i style="color:blue;">to Production</i></span></h2>
<h2>EOS+CTA <span><i style="color:blue;">to Production</i></span></h2>
###### Tests on PPS instance:
✅ write 200TB to PPS instance
✅ read back 200TB from PPS instance
###### Tests on RO instance:
✅ import CASTOR experiment namespace in eoscta RO instance
✅ read CASTOR data back from eoscta RO instance
###### More tests?
<h2>EOS+CTA <span><i style="color:blue;">to Production</i></span></h2>
title CTA timeline closeup
dateFormat YYYY-MM-DD
section ATLAS
eosctaatlaspps :2020-02-01, 2020-06-29
eosctaatlasro :2020-02-15, 2020-06-12
migration to CTA :2020-06-12, 2020-06-29
Production :crit, 2020-06-29, 2020-12-01
section ALICE
eosctaalicepps :2020-02-01,2020-09-30
eosctaalicero : 2020-05-01, 2020-09-30
Migration to CTA : 2020-09-30, 2020-10-12
Production :crit, 2020-10-12, 2020-12-01
section CMS
eosctacmspps :2020-02-01,2020-11-12
eosctacmsro :2020-10-12, 2020-11-12
Production :crit,2020-11-12, 2020-12-01
section LHCB
eosctalhcbpps :2020-09-01, 2020-12-01
## CTA and Atlas Caroussel
- **320k files/820TB**
- 20 enterprise IBM tape drives
- 7 LTO tape drives
- 3 buffer servers used
- 90TB of SSD buffer
- up to 8.6 GB/s of cumulated buffer BW
- <font color="crimson">BW capped by tape drive speed</font>
## Rucio rules submissions

## Actual transfer speed (FTS)

## ATLAS caroussel

## ATLAS caroussel and Rucio WS report (cont)
- <font color="dodgerblue">low error rate for CTA: **only ~350 errors**</font>
- 108 due to tape drive failure
- 140 due to a scheduled intervention on eosatlas
- 100 due to slow streams (marginal rate of slow writes)
**Standard production incidents were handled smoothly by CTA software and service operations**
## More CTA torture tests...
Last week: archive 1PB of data to tape

<h2>EOSCTA <span class="fragment"><i style="color:blue;">in Production</i></span></h2>
<h2>EOS+CTA <span><i style="color:blue;">in Production</i></span></h2>
eosctaatlas is in production since 29 June 2020




{"title":"201014 Hepix Autumn 2020 Production deployment of the CERN Tape Archive (CTA) for Atlas","description":"Production deployment of the CERN Tape Archive (CTA) for Atlas","slideOptions":{"transition":"slide","theme":"white"}}