# CMSSW Tutorial for Newcomers
## Table of Contents
1. [Overview of CMSSW Releases](#1-Overview-of-CMSSW-Releases)
2. [Setting up a CMSSW Environment](#2-Setting-up-a-CMSSW-Environment)
3. [CMSSW Package Structure](#3-CMSSW-Package-Structure)
1. [src/include vs. plugins](#31-srcinclude-vs-plugins)
2. [BuildFile.xml usage](#32-BuildFilexml-usage)
3. [classes_def.xml for persistent/transient classes](#33-classes_defxml-for-persistenttransient-classes)
4. [Some typical top-level packages](#34-Some-typical-top-level-packages)
4. [Compiling and Build Tools](#4-Compiling-and-Build-Tools)
5. [CMS Application Framework Basics](#5-CMS-Application-Framework-Basics)
1. [The Event Data Model (`edm::Event`)](#51-The-Event-Data-Model-edmEvent)
2. [EventSetup](#52-EventSetup)
3. [Producers, Analyzers, and Filters](#53-Producers-Analyzers-and-Filters)
4. [ParameterSets (Configurations)](#54-ParameterSets-Configurations)
6. [Job Configuration: Modules, Tasks, Paths, Sequences and Process](#6-Job-Configuration-Modules-Tasks-Paths-Sequences-and-Process)
7. [Generating and Running Configurations](#7-Generating-and-Running-Configurations)
1. [cmsDriver.py](#71-cmsDriverpy)
2. [runTheMatrix.py](#72-runTheMatrixpy)
3. [cmsRun command](#73-cmsRun-command)
4. [MessageLogger](#74-MessageLogger)
5. [Skipping Events](#75-Skipping-Events)
8. [Git Workflow in CMSSW](#8-Git-Workflow-in-CMSSW)
1. [Checking Out and Managing Packages](#81-Checking-Out-and-Managing-Packages)
2. [Merging, Rebasing, and Other Common Tasks](#82-Merging-Rebasing-and-Other-Common-Tasks)
9. [Useful Commands](#9-Useful-Commands)
10. [Examples and References](#10-Examples-and-References)
---
## 1. Overview of CMSSW Releases
- **Official Release**: Fully validated, recommended for stable analyses (e.g., `CMSSW_13_3_0`, `CMSSW_14_0_7_patch2`).
- **Pre-release**: A step before the official release (e.g., `CMSSW_13_0_0_pre2`), suitable for development or testing.
- **IB (Integration Build)**: Produced **twice a day** (at 11:00 and 23:00), merges recent PRs, valid for about two weeks (e.g., `CMSSW_14_0_X_2025-01-06-2300`). Cutting-edge but not fully validated.
You can run
```bash
scram list CMSSW
```
to see which releases are available on your machine or via CVMFS.
---
## 2. Setting up a CMSSW Environment
### 2.1 First step for non-CERN machines with cvmfs installed
When working on a non-CERN machine, most the of commonly used commands (like `scram`) are not available by default. If `cvmfs` is installed on the machine however, it's enough to run the script `cmsset_default.sh`:
```bash
source /cvmfs/cms.cern.ch/cmsset_default.sh
```
This script needs to be executed on every access to the machine, so it's useful to add it at the bottom of the `.bashrc` file.
After this step it's possible to choose a CMSSW release and setup a working area by following the next steps.
### 2.2 Create a working area for a CMSSW release
1. **Choose a release** (e.g., `CMSSW_14_0_0`).
2. Create a CMSSW working area:
```bash
cmsrel CMSSW_14_0_0
cd CMSSW_14_0_0/src
cmsenv
```
- `cmsenv` sets up necessary environment variables (paths, libraries, etc.).
3. (Optional) Initialize Git:
```bash
git cms-init
```
---
## 3. CMSSW Package Structure
A **typical CMSSW package** (e.g., `RecoLocalTracker/SiPixelRecHits`) might have:
- **`include/` or `interface/`**: Header files (`.h`) with class definitions and declarations.
- **`src/`**: Source files (`.cc`, `.cpp`) implementing algorithms or helper classes.
- **`plugins/`**: Module code that defines CMSSW modules (EDProducer, EDAnalyzer, EDFilter, etc.).
- **`test/`**: Scripts, tests, config files.
- **`python/`**: Python config fragments (`_cfi.py`, `_cff.py`) for use with `cmsRun`.
You can download a package’s source code by:
```bash
git cms-addpkg RecoLocalTracker/SiPixelRecHits
```
### 3.1 src/include vs. plugins
- **`src` + `include`**: Usually code that can be **used by other packages**. These produce shared libraries.
- **`plugins`**: Contains code which is typically **only used within** that package as a CMSSW “plugin.” (e.g., modules that get loaded by the framework at runtime, referencing `DEFINE_FWK_MODULE(MyProducer)`).
### 3.2 BuildFile.xml usage
In a package you might see multiple `BuildFile.xml`. These can only be located in:
- `BuildFile.xml` compiles general-purpose classes in the `src` folder to create a shared library.
- `plugins/BuildFile.xml` produces a plugin library with the EDM modules.
- `test/BuildFile.xml` compiles the test source code in the `test` folder.
Each `BuildFile.xml` defines how SCRAM should build, link, and export that subdirectory’s content.
### 3.3 classes_def.xml for persistent/transient classes
- If you have C++ classes that need to be **persisted** in ROOT files, you must declare them in `classes_def.xml`.
- Example:
```xml
<classes>
<class name="MyPackage::MyData"/>
<class name="std::vector<MyPackage::MyData>"/>
<class name="edm::Wrapper<std::vector<MyPackage::MyData>"/>
</classes>
```
### 3.4 Some typical top-level packages
- **Reco***: Tracking, jets, muons, etc. (reconstruction code).
- **Sim***: GEANT-based simulation, random services, etc.
- **DQM***: Data Quality Monitoring.
- **DataFormats***: Data structures (e.g., `Track`, `Jet`, etc.) used across the framework.
---
## 4. Compiling and Build Tools
After editing or adding packages, compile with:
```bash
scram b -j`nproc`
```
- `-j N` uses N threads.
To clean up:
```bash
scram b clean
```
---
## 5. CMS Application Framework Basics
### 5.1 The Event Data Model (`edm::Event`)
- `edm::Event` is a container for all RAW and reconstructed data for a single collision/MC event.
- Modules can retrieve or put data products by specifying the product’s type, label, and instance.
### 5.2 EventSetup
- Holds conditions data (geometry, calibration constants, alignment).
- Accessed via `es.get<SomeRecord>().get(token, handle);`.
### 5.3 Producers, Analyzers, and Filters
- **EDProducer**:
- Reads input from the event, produces new data products, and puts them back into the event.
- Declares its products with `produces<>()`.
- **EDAnalyzer**:
- Reads data from the event, but does **not** produce new data or filter events.
- Typically writes histograms, prints info, etc.
- **EDFilter**:
- Returns a boolean. If `false`, the event does not pass the filter (removing it from subsequent steps in that path).
### 5.4 Parameter Sets (Configurations)
- Each module is configured by a Python parameter set.
- Example in a `.py` config:
```python
process.myProducer = cms.EDProducer("MyProducer",
threshold = cms.double(0.7),
inputTag = cms.InputTag("someOtherModule")
)
```
- In the C++ constructor, you read these parameters:
```cpp
MyProducer::MyProducer(edm::ParameterSet const& iConfig) {
threshold_ = iConfig.getParameter<double>("threshold");
...
}
```
---
## 6. Job Configuration: Modules, Tasks, Paths, Sequences, and Processes
### Overview
- **`cms.Process("MyProcess")`**
The top-level Python object defining your job configuration. It collects all modules, paths, tasks, and other settings under a single “Process” namespace.
- **Modules**
A **module** is a self-contained piece of code that performs a specific event-processing function. Whenever you do:
```python
process.myProducer = cms.EDProducer("MyProducer", ...)
```
you create a module instance named `myProducer` of type `MyProducer`. The same pattern applies for `EDFilter` or `EDAnalyzer`.
- **Producers** (`EDProducer`) create new data products and place them into the `edm::Event`.
- **Filters** (`EDFilter`) decide whether an event passes or fails a selection.
- **Analyzers** (`EDAnalyzer`) read data to analyze or plot but do not affect event flow.
---
### Paths
- A **path** is an ordered list of modules (or sequences) that defines part of the event-processing chain.
- By convention, only **producers** or **filters** have a direct impact on the downstream event flow, but analyzers can also appear on a path (their output doesn’t affect producers that follow).
- **Example**:
```python
process.myPath = cms.Path(process.moduleA * process.moduleB)
```
Here, `moduleA` runs first, then `moduleB`.
**Important**: The event passes along the path in the specified order. If a filter returns “false,” the path stops processing that event at that filter.
---
### Sequences
- A **sequence** groups modules for clarity or reusability; you can place a sequence on a path instead of listing modules individually.
- **All modules** in a sequence will run **in order** if that sequence is included on a path.
- For example:
```python
process.mySequence = cms.Sequence(process.moduleC * process.moduleD)
process.myPath = cms.Path(process.mySequence)
```
- A sequence **does not** run on its own. You **must** either:
1. Place it on a path, or
2. Include it in a **Task** that is scheduled.
---
### Tasks
- A **Task** is a container for modules that are **not** explicitly on a path but still must run to produce certain data products.
- Useful for “helper” producers or calibration modules that feed multiple other modules without needing to appear in the direct path.
- **Example**:
```python
process.myTask = cms.Task(process.helperProducer1, process.helperProducer2)
```
If `myTask` is included in a path:
```python
process.myPath = cms.Path(process.someFilter, process.myTask)
```
the framework ensures those helper producers get scheduled if their outputs are consumed.
---
### Schedule
- By default, **all** paths you define (`process.pathName = cms.Path(...)`) become part of the schedule.
- If you want to override or explicitly define the execution order of multiple paths:
```python
process.schedule = cms.Schedule(process.myPath, process.anotherPath)
```
---
### Fragments and Merging Configurations
- A **configuration fragment** is a smaller piece of Python config (e.g., `myProducer_cfi.py`) that you import into a larger config.
- **Example**:
```python
from MyPackage.MySubPackage.myProducer_cfi import myProducer
```
merges or loads `myProducer` with its default parameters.
- This modular approach keeps configs maintainable and allows re-use across different workflows.
---
### Process Modifiers
- **Process modifiers** let you conditionally alter your configuration for special scenarios (e.g., GPU-enabled, phase-2 geometry, fast timing).
- Declared in Python, for instance:
```python
from Configuration.Eras.Modifier_phase2_common_cff import phase2_common
phase2_common.toModify(process.myProducer, threshold=1.0)
```
This sets `process.myProducer.threshold = 1.0` **only** when `phase2_common` is active (i.e., using `--era Phase2C` in `cmsDriver`).
---
### Customisation Functions
- A **customisation function** is a Python function that takes `process` as input, modifies it, then returns it.
- **Example**:
```python
def customiseForSpecialCase(process):
process.myProducer.threshold = 2.0
return process
```
- You can apply it via `cmsDriver.py`:
```bash
cmsDriver.py step3 ... --customise MyPackage.MySubPackage.customiseForSpecialCase
```
so that `customiseForSpecialCase` is automatically called on the final `process`.
---
## 7. Generating and Running Configurations
### 7.1 cmsDriver.py
- Used to generate comprehensive configuration files for simulation, digitization, reconstruction, etc.
- **Example**:
```bash
cmsDriver.py step3 \
--conditions auto:phase2_realistic \
--eventcontent AODSIM \
--datatier AODSIM \
--step RAW2DIGI,RECO \
--filein file:step2.root \
--fileout file:step3.root \
-n 100 \
--nThreads 8 \
--no_exec
```
- Produces `step3_RAW2DIGI_RECO.py`. You can run:
```bash
cmsRun step3_RAW2DIGI_RECO.py
```
### 7.2 runTheMatrix.py
- Script that automates many standard workflows.
- **`-l`** specifies which workflow ID(s) to run, **`--nEvents`** sets number of events, , **`-n`** lists the workflows, **`-w`** specifies the set of workflows etc.
- Example:
```bash
runTheMatrix.py -w upgrade -n # list all upgrade workflows
runTheMatrix.py -w upgrade -l 20893.0 -j 0
```
This dumps configurations (`step1.py`, `step2.py`, etc.) and can run them (`-j` is the number of processes. `-j 0` will use 4 processes, not execute anything but create the wfs).
### 7.3 cmsRun command
- **`cmsRun`** executes a CMSSW `.py` configuration file:
```bash
cmsRun step3.py
```
- The `.py` file often contains more modules than actually needed. At runtime:
1. the framework builds all the modules in the python file
2. the constructors declare the dependencies (`consumes`, `produces`)
3. the framework builds the Directed Acyclic Graph (DAG) of the modules to run
4. The event loop starts and executes only the modules needed. Eventually additional filters can decide whether or not to run the remaining modules.
- If you want to see a summary of which modules actually ran, put:
```python
process.options.wantSummary = True
```
in the `step*.py` to print CPU usage, how many times each module was run, etc.
### 7.4 MessageLogger
- At runtime, the level of verbosity of the simulation code can be controlled through a general service of the framework called `MessageLogger`. Different levels of verbosity (DEBUG, INFO, WARNING, ERROR) can be selected by the user, as well as output streams to which to send different messages (cout, cerr, external files of user's choice).
- To configure:
```python
process.load("FWCore.MessageLogger.MessageLogger_cfi")
process.MessageLogger = cms.Service("MessageLogger",
cout = cms.untracked.PSet(
default = cms.untracked.PSet(
limit = cms.untracked.int32(0)
## kill all messages in the log
),
SimG4CoreApplication = cms.untracked.PSet(
limit = cms.untracked.int32(-1)
## but SimG4CoreApplication category - those unlimited
)
),
categories = cms.untracked.vstring('SimG4CoreApplication'),
destinations = cms.untracked.vstring('cout')
)
```
### 7.5 Skipping Events
- If you want to skip the first `N` events in your input file:
```python
process.source.skipEvents = cms.untracked.uint32(N)
```
---
## 8. Git Workflow in CMSSW
### 8.1 Checking Out and Managing Packages
- **Initialize** your local repository (in the `src/` directory):
```bash
git cms-init
```
- **Add** a package:
```bash
git cms-addpkg RecoLocalTracker/SiPixelRecHits
```
- **Check out** all modified packages from your working release:
```bash
git diff --name-only $CMSSW_VERSION | cut -d/ -f-2 | sort -u | xargs -r git cms-addpkg
```
- **Check dependencies**:
```bash
git cms-checkdeps -p # python modules and their dependencies
git cms-checkdeps -h # header file and their dependencies
git cms-checkdeps -b # BuildFile files and their dependencies
git cms-checkdeps -A # -h, -p and -b together
git cms-checkdeps -a # checkout packages automatically
```
### 8.2 Merging, Rebasing, and Other Common Tasks
- **Push changes** to your fork/branch:
```bash
git push my-cmssw my_branch
```
- **Merge** a topic branch from another user:
```bash
git cms-merge-topic username:branch
```
- **Rebase** onto another branch:
```bash
git cms-rebase-topic username:branch
```
- **Add remote** repository:
```bash
git cms-remote add cms-patatrack
git fetch cms-patatrack
```
- **Merge** a branch from that remote:
```bash
git merge cms-patatrack/Alpaka_updates_13.0.x
```
---
## 9. Useful Commands
1. **edmDumpEventContent**:
Lists all collections in a ROOT file:
```bash
edmDumpEventContent step3.root
```
2. **edmPythonConfigToCppValidation**:
Helps generate C++ `fillDescriptions()` from Python config:
```bash
edmPythonConfigToCppValidation file.py
```
3. **scram b runtests**:
Runs tests in all checked-out packages.
```bash
scram b runtests
grep -h '^---> test' $CMSSW_BASE/tmp/$SCRAM_ARCH/src/*/*/test/*/testing.log | \
grep 'had ERRORS\|'
```
---
## 10. Examples and References
1. **Minimal config**:
```python
import FWCore.ParameterSet.Config as cms
process = cms.Process("MYPROC")
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring("file:input.root")
)
process.myProducer = cms.EDProducer("MyProducer",
threshold = cms.double(0.7),
inputTag = cms.InputTag("someOtherProducer")
)
process.myAnalyzer = cms.EDAnalyzer("MyAnalyzer")
process.out = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string("output.root")
)
process.myPath = cms.Path(process.myProducer * process.myAnalyzer)
process.outPath = cms.EndPath(process.out)
process.options.wantSummary = True
# run it:
# cmsRun myConfig.py
```
2. **Official Documentation**:
- [CMSSW WorkBook Chapter 2.3 (“CMSSW Application Framework”)](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCMSSWFramework)
- [More on the CMS framework](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMoreOnCMSSWFramework)
- [Configuration File Documentation](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookConfigFileIntro)
**Final Tips**:
- Always do `cmsenv` after entering a CMSSW `src/` directory so that environment variables are set correctly.
- The CMSSW framework automatically constructs the processing graph based on module dependencies — only modules that produce or consume relevant data will run (plus any forced by Paths).
- Use `process.options.wantSummary = True` to see the runtime summary.
- The “software bus” concept (the `edm::Event`) ensures modular development: each module reads only what it needs and can produce new data for subsequent modules.
---
**Happy coding in CMSSW!**
*Original Author: Felice Pantaleo CERN (felice.pantaleo@cern.ch) - January 2025*