Global Computing at the Heart of Universal Science 2019

--- # Global Computing ###### at the heart of # Universal Science *Hannah Short, hannah.short@cern.ch* Note: We've just heard all about the Physics that's going on at CERN, but one of the aspects that's less well known is the computing that goes on behind the scenes and under the surface to enable that physics data to get into the hands of researchers around the world. I'm going to be giving you a glimpse into the global computing infrastructures that really are at the heart of science. --- ## What do I do? Collaborate with other Research Communities to make Science more open and secure. ![](https://codimd.web.cern.ch/uploads/upload_531d77579e08352a6ccaf74d232968c9.jpg =200x) *Specialist in Digital Trust and Identity* ---  Note: I’d like to start off by taking us back to the early 2000s. A new particle accelerator, the Large Hadron Collider, was being constructed in the existing tunnel based at CERN in Geneva, Switzerland. It was going to have 4 experiments producing data that would need to be stored.They knew that in 2008, when it was switched on, there would be an expected output of tens of PB per year. --- ## Supply - Demand = ? Note: There were certain constraints, namely funding and space, meaning that demand exceeded supply by some Peta Bytes. 1 petabyte is 4000 mac book pros (by today's standard!). In the early 2000s, storing and processing 4000 mac book pros worth of data was a much larger problem than it would be today. If you're not an apple user, 1 PB is the amount of data you would generate if you took about 4000 photos per day for your entire life. These scientific challenges are not new but constantly evolve and it's technology that needs to keep up with it. I'm going to come back with how the scientific community was able to solve this challenge but first I think we need to put it in context. --- ## Data Sharing Data starts pouring out of experiments and needs to be made available to scientists... - How can we store it? - How can we transfer it? - How can they analyse it? ---  ## 1960s ---  ## 1960s Note: Here we have a man collecting radioactivity readings around the laboratory, and a lady recording particle tracks from a bubble chamber. One aspect that they have in common is the data collection - it's written down, on paper. It was then a complicated process of storing that paper according to a logical system, and making sure that the data got to the correct people. We have a relatively low quantity of data and transfer was done physically; often via the post, or stuffed into a suitcase and taken by the scientist themselves to their own university. Within CERN, bicycles were often used. Researchers physically came to CERN, manual data transfer was feasible. ---  ## 1980s Note: We enter the days of the large scale high energy physics experiments. The Large Electron Positron collider occupied the tunnel before the LHC. In the entire run of the LEP it generated 400 TB, peanuts compared to today's standards. At the same time, working remotely was becomming much more common place with the advent of email. They entered a period where more and more information (project updates, designs, plans), in addition to the data coming from the experiments, was relevant to a wider audience. How could they efficiently publish information, keep it updated and provide everyone with the same facts? 1989 --- ##  Note: The World Wide Web was built exactly for this problem of data sharing. It was a fundamental shift in the way data was distributed. You no longer needed to replicate the information and go around on your bike to make sure the correct people received it. In this new model, you could update your local copy once and interest people could simply read it, from wherever they were (as long as they were connected to the internet!). This image contains a note from the supervisor of Tim Berners-Lee, he may have under-estimated quite how much an impact the web would have on our lives! --- `info.cern.ch` ![firstpage](https://codimd.web.cern.ch/uploads/upload_5f5bc388c3420a8827567c0031e980bd.png =600x) Note: This page was the first web site. It's about the World Wide Web. It was a very humble beginning but this technology has spread beyond science and has impacted you all. It was recently brought back and you can visit it at info.cern.ch ---  ## 2000s Note: Back to where we started, we have a defecit of storage for tens of petabytes of data annually. We're missing physical resources, and limited by money. However a main asset of CERN is the collabration of hundreds of institutions and universities from around the world. Around this time there was an idea for a computing grid. Like a power grid that allows users to use electricity from multiple sources, computing grids provide computing from multiple data centres to researchers - they don't need to know or care where their code and data is actually located! CERN decided to leverage its collaborations and create one of the very first large scale computing grids. If this all sounds a bit familiar, you may be thinking of "Clouds" which are effectively the next generation. ---  Note: In 1999 an idea for a computing grid was proposed, and this met the needs of CERN. CERN meets only about 20% of the computing needs. Unfortunately the networking backbone was not completely in place to allow CERN's collaborators to share data between them, Mention that specific lines had to be set up, a lot of effort went in to physically connecting universities and laboratories around the world with high-speed internet connections. --- ## 2000s | | Field | Users | Countries | Sites | | ---- |---- | ----- | ----- | -------- | | LIGO | Gravitational Waves| 1,200 | 20 | 9 | WLCG (CERN) | High Energy Physics | 13,000 | 43 | 170 | ESGF | Climate Science | 17,000 | 13 | 18 Note: Mention that WLCG is us! --- ##  Note: Users in general are problematic - I've been both a user and a software developer so hopefully am allowed to complain. Having users distributed around the world is a specific. We need to trust the users who are accessing our computing infrastructure. We need to know who they are, and they are affiliated with the Research Community. When you have a user group of thousands of users around the world, it's not feasible for them to all come to a central place to have their identity checked. In the early 2000s they decided to use certificates. --- ## Researcher Access ![](https://codimd.web.cern.ch/uploads/upload_93a22e5d7e9216a7d31a62319e81f332.png =500x) Note: Certificates, like for websites but for individuals. The community set up local certificate authorities where you could have your identity verified and be issued with a certficate that would grant you access to the infrastructure. This allowed the experiment to outsource authentication. Built up distributed CAs. --- ## Researcher Access ![](https://codimd.web.cern.ch/uploads/upload_c2f66d492e5258afd610e7a232f913e5.png =500x) Note: The problem with certificates is that they are complicated to manage well, these days you're all used to logging in with Google etc, maybe we can do the same with logging in through home organisations? --- ## 2020s - High-luminosity LHC - 500 PB expected annually - How will CERN and its collaborators cope? ![](https://codimd.web.cern.ch/uploads/upload_d55ee73a69437f54a1f495a483febae3.jpeg =400x) Note: With higher number of collisions, more data will be collected as more particles are produced. Particles are detected by different layers of experiments, and their position and energy are recorded to be able to reconstruct what has happened at the point of impact. The data output will rise tenfold. --- ## 2020s - Improved data selection and filtering - Efficient coding - Joint R&D with academia and private sector - Volunteer computing ![](https://codimd.web.cern.ch/uploads/upload_eef548a18a3442783a5fee4e5ae39400.png =400x) Note: Check on data selection. There are ongoing partnerships with commercial organisations, universities and funders such as the European Commission. Although CERN envisages keeping all data stored within the research community, using opportunistic commercial cloud resources for times of peak demand may prove cost effective. A key aspect will be to ensure that only the necessary data is maintained. And we need to take a step back and analyse whether we are making the best use of the computing sources available. This includes data centres at CERN and large institutions around the world, commercial cloud, and volunteer computing. Opportunistic resources. Volunteers are not necessarily individuals. --- ###### Why does this matter? ## Collaboration and Diversity are at the heart of Global Computing for Universal Science Note: The computing behind science has allowed data to be available to researchers around the world. CERN supports the mission for open data, open software, even open hardware. Distribution has allowed 170 countries to actively collaborate on a joint computing problem, which brings with it a certain amount of diversity of thought. Different cultures' approaches to problems often help to drive progress and highlight areas that may have been overlooked. --- ## Credits - Pictures from CERN CDS and from GEANT - Statistics from other experiments fim4r.org - WLCG wlcg-public.web.cern.ch ---