# Student projects
## Open:
Contact Thea Aarrestad (thea.aarrestad@cern.ch) for more details.
### Analyse collected AXOL1TL data with neural embeddings and NPLM (LHC)
The AXOL1TL anomaly detection algorithm has been running live in CMS since 2024, collecting interesting physics data ready for analysis. In this project the student will develop novel ML pattern recongnition and signal extraction algorithms for BSM searches.
**Level:** Master
### Generative ML for 40 MHz scouting Monte Carlo
For HL-LHC, CMS will collect data at the full bunch crossing rate. This will generate an overwhelming amount of data, requiring an overwhelming amount of simulation. We do not have the CPu power to afford this. Work on developing simulation surrogates - generative models we can oversample from, that allow us to do performance studies for HL-LHC
**Level:** Master
### Event-level neural embeddings for anomaly detection at 40 MHz (HL-LHC)
Use ML-based contrastive metric learning techniques to design high-fidelity neural embeddings for outlier detection (see [this paper](https://journals.aps.org/prd/accepted/10.1103/5n77-ynsp)) in real time.
**Level:** Master thesis / master semester thesis
### Jet-level transformers for anomaly detection at 40 MHz
Use ML-based contrastive metric learning techniques to design high-fidelity neural embeddings for outlier detection (see [this paper](https://journals.aps.org/prd/accepted/10.1103/5n77-ynsp))
**Level:** Master thesis / master semester thesis
### Real-time, sub-pixel resolution on FPGA for electron microscopy
Start from an exisiting CNN for sub-pixel resolution, and compress model with high granularity quantization. Deploy and benchmark on an FPGA accelerator
**Level:** Master thesis / master semester thesis
### Jet substructure tagging in the CMS Level 1 trigger
Continue development of an ML-based jet tagging algorithm for the identification of jet substructure in the CMS Level-1 trigger for HL-LHC
**Level:** Master thesis / master semester thesis
### LLMs on FPGAS for fast AI agent filtering
Hardware/ML oriented.
**Co-supervised with Claudionor Coelho (zscaler), Benjamin Ramhorst (ETH CS)**
**Level:** Master Thesis
### QONNX: Resource and latency modeling for Neural Networks on FPGAs
Implement and commision tools for performing in-software estimates of FPGA resource consumption and latency estimates in the QONNX library. Mostly a software project (desigining tools for hardware acceleration), but will include a physics usecase component.
**Co-supervised with Benjamin Ramhorst.**
**Level:** Master Thesis
## Ongoing:
### Towards all-hadronic final state anomaly searches at 40 MHz
Tamara Leuthold (master, tleuthold@student.ethz.ch)
### A foundation model for the HL-LHC Level-1 trigger
Philip Ploner (semester)
### Front-end aware clustering algorithms for the CMS High Granularity Calorimeter
Lorenzo Asfour (semester, lasfour@student.ethz.ch)
## Completed:
### An end-to-end pipeline for uncertainty-aware validation of generative Artificial Intelligence
Density estimation with generative Artificial Intelligence (AI) is a common task in the physical sciences, with applications ranging from particle physics to gravitational-wave parameter estimation. Many of the existing methods, however, do not provide a way to estimate epistemic uncertainties, which are essential for reliable hypothesis testing. We propose an end-to-end framework combining generative modeling with principled uncertainty quantification. A normalizing-flow ensemble is trained to synthesize events;ensemble-based epistemic uncertainties are computed and propagated into a learned likelihood–ratio goodness-of-fit (GoF) test. This yields robust distributional estimates that allow to synthesize significantly more events than those in the original training dataset and enable uncertainty-aware scientific discovery.
**Master Thesis of Giada Badaracco**
**Status: Completed Fall 2025**
### Physics-inspired dynamic graph neural networks embedding approximate symmetries for the CMS Experiment at CERN
The project is to implement certain graph neural networks that are invariant to different symmetry groups, relevant to physics, and test them for jet tagging, which is a classification task in particle physics. Co-supervised with Prof. Siddhartha Mishra (Professor of Applied Mathematics, ETH).
**Level: Semester Thesis of Stelea Sanziana**
**Status: Completed Spring 2025**
### Normalizing Flows for MC generation and New Physics searches
As searches at the LHC probe increasingly rare signals against an overwhelming background of Standard Model events, progressively tighter selection criteria are applied to enhance signal-rich regions. Simulated background samples serve as the basis for hypothesis testing, enabling comparisons between observed data and expected Standard Model backgrounds. However, this approach becomes challenging when the available background statistics are insufficient. This talk presents an end-to-end framework for estimating background models endowed with uncertainties. We train a generative model, explore different approaches to attribute a shape uncertainty and check its compatibility with the underlying ground truth using NPLM, a machine learning-based goodness-of-fit test. This procedure allows us to assess to which extent generative AI models are safe for sampling. By incorporating well-defined uncertainties, we ensure the framework can perform effectively even in data-limited scenarios to provide robust and reliable anomaly detection.
**Level: Semester Thesis of Giada Badaracco**
**Status: Completed Spring 2025**
**Reference: [EuCAIF talk](https://agenda.infn.it/event/43565/contributions/260019/)**
### COLLIDE-2V: A Comprehensive LHC Collision Dataset for Foundation Model Development
**Semester Thesis of Phillip Ploner**
We present COLLIDE-2V, an all-encompassing, high-fidelity dataset designed to serve as a cornerstone for the development of foundation models in high-energy physics. Generated under realistic High-Luminosity LHC (HL-LHC) conditions, COLLIDE-2V encapsulates a wide spectrum of physics processes, detector responses, and experimental complexities representative of the HL-LHC environment, including high pile-up, rare event topologies, and detector effects. The dataset spans multiple levels of event representation—parton-level, particle-level, and detector-level. With a special dual view, the events are reconstructed at both the trigger level and offline, with different realistic object resolutions. With about a billion simulated events of Standard Model processes and new physics scenarios, and accompanying metadata for conditioning and tagging, COLLIDE-2V is structured to support scalable pretraining and transfer learning across a broad range of physics tasks, from reconstruction to anomaly detection and generative modeling. COLLIDE-2V is openly accessible and designed for interoperability with modern deep learning frameworks, laying the foundation for the next era of AI-native physics discovery.
Status: Completed Spring 2025
Reference: [Fast ML 2025 talk](https://indico.cern.ch/event/1496673/contributions/6637964/attachments/3128419/5549918/FM_Collide2V_EMoreno.pdf)**
Contact: plonerp@student.ethz.ch
### Optimal Transport and Model Independent Statistical Tests for New Physics searches
Design ML-based model independent New Physics Analysis for Phase 2 scouting in CMS. Co-supervised with Gaia Grosso, Katya Govorkova and Phil Harris (MIT).
**Master Thesis of Zhengting He**
**Status: Completed Fall 2024**
### Physics-inspired dynamic graph neural networks embedding approximate symmetries for the CMS Experiment at CERN
The project is to implement certain graph neural networks that are invariant to different symmetry groups, relevant to physics, and test them for jet tagging, which is a classification task in particle physics. The network will be analysed, and the number of FLOPS, tentative mathematical guarantees and comparison with current best models will be determined (LorentzNet, ParticleTransformer, PELICAN). Furthermore following Walters and Wang Approximately Equivariant Networks for Imperfectly Symmetric Dynamics the network equivariance is mitigated to better fit the real world data produced in the CMS detector in CERN. Particle physics is an ideal playground for testing equivariant networks as the Standard Model is full of symmetries. The input data can consist of the transversal momentum and two angles of a constituent particle. Therefore implementing networks is not a simple application of already existing architectures as the equivariance should exist for each input specifically and not globally on the entries. Giving the network only to symmetry equivariant functions to learn should theoretically induce better performance, it could be more understandable in terms of mathematical analysis and should be more efficient for the inference. The need for low latency models in particle physics is particularly important for the selection of stored events in the CMS Experiment, therefore developing a solution for jet tagging could potentially help to construct an algorithm that would only register interesting events during a collision. It is a nice illustration of the usage of approximately equivariant networks in a real case scenario. Co-supervised with Prof. Siddhartha Mishra (Professor of Applied Mathematics, ETH).
**Master Thesis of Matthias Bonvin**
**Status: Completed Fall 2024**
### Jet tagging for HL-LHC
Get inspiration from the work here and implement one of these algorithms for real data taking in CMS. Co-supervised with Sioni Summers (CERN).
**Semester Thesis of Asra Serinken**
**Status: Completed Spring 2025**
### Incorporating physics-motivated symmetries into Neural Networks for high-energy particle physics experiments
**Semester thesis of Matthias Bonvin**
Co-supervised with Günther Dissertori at ETH Zurich
**Status: Completed Fall 2023**
### Scouting for anomalous events with unsupervised AI in the CMS hardware trigger
**PhD thesis of Patrick Odagiu**
Co-supervised with Günther Dissertori at ETH Zurich
**Status: Ongoing**
### AXOL1TL: Real-time anomaly detection in the CMS hardware trigger
**Master thesis of Chang Sun**
Co-supervised with Günther Dissertori at ETH Zürich
Presented at Fast Machine Learning for Science 2023
**Status: Completed Fall 2023**
### Latency and resource-aware decision trees for faster FPGA inference at the LHC
Master thesis of Andrew Oliver
Co-supervised with Sioni Summers (CERN), M. Guillame-Bert (Google) and Prof. Dr. G. Dissertori (ETHZ)
Presented at Fast Machine Learning for Science 2023
**Status: Completed Fall 2023**
### Deep Neural Network to Identify High-Energy B Hadrons via their Hit Multiplicity Increase through Pixel Detection Layers
UZH Bachelor Thesis by M. Sommerhalder
Main supervisor: M. Sommerhalder
Feb-Aug 2018, github.com/msommerh/bTag_HitCount
**Status: Completed Fall 2018**
### Explainable Anomaly Detection for New Physics searches at the LHC with PIDForest
Jessica Prendi
Co-supervised with Prof. Dr. G. Dissertori (ETHZ), Dr. S. Summers (CERN), Dr. M. Guillame-Bert and Dr. R. Stotz (Google)
**Status: Completed Spring 2023**
### Detecting long-lived particles trapped in detector material at the LHC
CERN summer student project by Jasmine Simms
Co-supervised with Juliette Alimena
Published in Phys.Rev.D 105, L051701
**Status: Completed Summer 2021**
### Convolutional Autoencoders for Anomaly Detection in the L1 Trigger
CERN Student 2020, Sierra Weyhmiller
Co-supervisor
https://indico.cern.ch/event/947570/
**Status: Completed Summer 2020**