77 views
 owned this note
# CMSSW Tutorial for Newcomers ## Table of Contents 1. [Overview of CMSSW Releases](#1-Overview-of-CMSSW-Releases) 2. [Setting up a CMSSW Environment](#2-Setting-up-a-CMSSW-Environment) 3. [CMSSW Package Structure](#3-CMSSW-Package-Structure) 1. [src/include vs. plugins](#31-srcinclude-vs-plugins) 2. [BuildFile.xml usage](#32-BuildFilexml-usage) 3. [classes_def.xml for persistent/transient classes](#33-classes_defxml-for-persistenttransient-classes) 4. [Some typical top-level packages](#34-Some-typical-top-level-packages) 4. [Compiling and Build Tools](#4-Compiling-and-Build-Tools) 5. [CMS Application Framework Basics](#5-CMS-Application-Framework-Basics) 1. [The Event Data Model (`edm::Event`)](#51-The-Event-Data-Model-edmEvent) 2. [EventSetup](#52-EventSetup) 3. [Producers, Analyzers, and Filters](#53-Producers-Analyzers-and-Filters) 4. [ParameterSets (Configurations)](#54-ParameterSets-Configurations) 6. [Job Configuration: Modules, Tasks, Paths, Sequences and Process](#6-Job-Configuration-Modules-Tasks-Paths-Sequences-and-Process) 7. [Generating and Running Configurations](#7-Generating-and-Running-Configurations) 1. [cmsDriver.py](#71-cmsDriverpy) 2. [runTheMatrix.py](#72-runTheMatrixpy) 3. [cmsRun command](#73-cmsRun-command) 4. [MessageLogger](#74-MessageLogger) 5. [Skipping Events](#75-Skipping-Events) 8. [Git Workflow in CMSSW](#8-Git-Workflow-in-CMSSW) 1. [Checking Out and Managing Packages](#81-Checking-Out-and-Managing-Packages) 2. [Merging, Rebasing, and Other Common Tasks](#82-Merging-Rebasing-and-Other-Common-Tasks) 9. [Useful Commands](#9-Useful-Commands) 10. [Examples and References](#10-Examples-and-References) --- ## 1. Overview of CMSSW Releases - **Official Release**: Fully validated, recommended for stable analyses (e.g., `CMSSW_13_3_0`, `CMSSW_14_0_7_patch2`). - **Pre-release**: A step before the official release (e.g., `CMSSW_13_0_0_pre2`), suitable for development or testing. - **IB (Integration Build)**: Produced **twice a day** (at 11:00 and 23:00), merges recent PRs, valid for about two weeks (e.g., `CMSSW_14_0_X_2025-01-06-2300`). Cutting-edge but not fully validated. You can run ```bash scram list CMSSW ``` to see which releases are available on your machine or via CVMFS. --- ## 2. Setting up a CMSSW Environment ### 2.1 First step for non-CERN machines with cvmfs installed When working on a non-CERN machine, most the of commonly used commands (like `scram`) are not available by default. If `cvmfs` is installed on the machine however, it's enough to run the script `cmsset_default.sh`: ```bash source /cvmfs/cms.cern.ch/cmsset_default.sh ``` This script needs to be executed on every access to the machine, so it's useful to add it at the bottom of the `.bashrc` file. After this step it's possible to choose a CMSSW release and setup a working area by following the next steps. ### 2.2 Create a working area for a CMSSW release 1. **Choose a release** (e.g., `CMSSW_14_0_0`). 2. Create a CMSSW working area: ```bash cmsrel CMSSW_14_0_0 cd CMSSW_14_0_0/src cmsenv ``` - `cmsenv` sets up necessary environment variables (paths, libraries, etc.). 3. (Optional) Initialize Git: ```bash git cms-init ``` --- ## 3. CMSSW Package Structure A **typical CMSSW package** (e.g., `RecoLocalTracker/SiPixelRecHits`) might have: - **`include/` or `interface/`**: Header files (`.h`) with class definitions and declarations. - **`src/`**: Source files (`.cc`, `.cpp`) implementing algorithms or helper classes. - **`plugins/`**: Module code that defines CMSSW modules (EDProducer, EDAnalyzer, EDFilter, etc.). - **`test/`**: Scripts, tests, config files. - **`python/`**: Python config fragments (`_cfi.py`, `_cff.py`) for use with `cmsRun`. You can download a package’s source code by: ```bash git cms-addpkg RecoLocalTracker/SiPixelRecHits ``` ### 3.1 src/include vs. plugins - **`src` + `include`**: Usually code that can be **used by other packages**. These produce shared libraries. - **`plugins`**: Contains code which is typically **only used within** that package as a CMSSW “plugin.” (e.g., modules that get loaded by the framework at runtime, referencing `DEFINE_FWK_MODULE(MyProducer)`). ### 3.2 BuildFile.xml usage In a package you might see multiple `BuildFile.xml`. These can only be located in: - `BuildFile.xml` compiles general-purpose classes in the `src` folder to create a shared library. - `plugins/BuildFile.xml` produces a plugin library with the EDM modules. - `test/BuildFile.xml` compiles the test source code in the `test` folder. Each `BuildFile.xml` defines how SCRAM should build, link, and export that subdirectory’s content. ### 3.3 classes_def.xml for persistent/transient classes - If you have C++ classes that need to be **persisted** in ROOT files, you must declare them in `classes_def.xml`. - Example: ```xml <classes> <class name="MyPackage::MyData"/> <class name="std::vector<MyPackage::MyData>"/> <class name="edm::Wrapper<std::vector<MyPackage::MyData>"/> </classes> ``` ### 3.4 Some typical top-level packages - **Reco***: Tracking, jets, muons, etc. (reconstruction code). - **Sim***: GEANT-based simulation, random services, etc. - **DQM***: Data Quality Monitoring. - **DataFormats***: Data structures (e.g., `Track`, `Jet`, etc.) used across the framework. --- ## 4. Compiling and Build Tools After editing or adding packages, compile with: ```bash scram b -j`nproc` ``` - `-j N` uses N threads. To clean up: ```bash scram b clean ``` --- ## 5. CMS Application Framework Basics ### 5.1 The Event Data Model (`edm::Event`) - `edm::Event` is a container for all RAW and reconstructed data for a single collision/MC event. - Modules can retrieve or put data products by specifying the product’s type, label, and instance. ### 5.2 EventSetup - Holds conditions data (geometry, calibration constants, alignment). - Accessed via `es.get<SomeRecord>().get(token, handle);`. ### 5.3 Producers, Analyzers, and Filters - **EDProducer**: - Reads input from the event, produces new data products, and puts them back into the event. - Declares its products with `produces<>()`. - **EDAnalyzer**: - Reads data from the event, but does **not** produce new data or filter events. - Typically writes histograms, prints info, etc. - **EDFilter**: - Returns a boolean. If `false`, the event does not pass the filter (removing it from subsequent steps in that path). ### 5.4 Parameter Sets (Configurations) - Each module is configured by a Python parameter set. - Example in a `.py` config: ```python process.myProducer = cms.EDProducer("MyProducer", threshold = cms.double(0.7), inputTag = cms.InputTag("someOtherModule") ) ``` - In the C++ constructor, you read these parameters: ```cpp MyProducer::MyProducer(edm::ParameterSet const& iConfig) { threshold_ = iConfig.getParameter<double>("threshold"); ... } ``` --- ## 6. Job Configuration: Modules, Tasks, Paths, Sequences, and Processes ### Overview - **`cms.Process("MyProcess")`** The top-level Python object defining your job configuration. It collects all modules, paths, tasks, and other settings under a single “Process” namespace. - **Modules** A **module** is a self-contained piece of code that performs a specific event-processing function. Whenever you do: ```python process.myProducer = cms.EDProducer("MyProducer", ...) ``` you create a module instance named `myProducer` of type `MyProducer`. The same pattern applies for `EDFilter` or `EDAnalyzer`. - **Producers** (`EDProducer`) create new data products and place them into the `edm::Event`. - **Filters** (`EDFilter`) decide whether an event passes or fails a selection. - **Analyzers** (`EDAnalyzer`) read data to analyze or plot but do not affect event flow. --- ### Paths - A **path** is an ordered list of modules (or sequences) that defines part of the event-processing chain. - By convention, only **producers** or **filters** have a direct impact on the downstream event flow, but analyzers can also appear on a path (their output doesn’t affect producers that follow). - **Example**: ```python process.myPath = cms.Path(process.moduleA * process.moduleB) ``` Here, `moduleA` runs first, then `moduleB`. **Important**: The event passes along the path in the specified order. If a filter returns “false,” the path stops processing that event at that filter. --- ### Sequences - A **sequence** groups modules for clarity or reusability; you can place a sequence on a path instead of listing modules individually. - **All modules** in a sequence will run **in order** if that sequence is included on a path. - For example: ```python process.mySequence = cms.Sequence(process.moduleC * process.moduleD) process.myPath = cms.Path(process.mySequence) ``` - A sequence **does not** run on its own. You **must** either: 1. Place it on a path, or 2. Include it in a **Task** that is scheduled. --- ### Tasks - A **Task** is a container for modules that are **not** explicitly on a path but still must run to produce certain data products. - Useful for “helper” producers or calibration modules that feed multiple other modules without needing to appear in the direct path. - **Example**: ```python process.myTask = cms.Task(process.helperProducer1, process.helperProducer2) ``` If `myTask` is included in a path: ```python process.myPath = cms.Path(process.someFilter, process.myTask) ``` the framework ensures those helper producers get scheduled if their outputs are consumed. --- ### Schedule - By default, **all** paths you define (`process.pathName = cms.Path(...)`) become part of the schedule. - If you want to override or explicitly define the execution order of multiple paths: ```python process.schedule = cms.Schedule(process.myPath, process.anotherPath) ``` --- ### Fragments and Merging Configurations - A **configuration fragment** is a smaller piece of Python config (e.g., `myProducer_cfi.py`) that you import into a larger config. - **Example**: ```python from MyPackage.MySubPackage.myProducer_cfi import myProducer ``` merges or loads `myProducer` with its default parameters. - This modular approach keeps configs maintainable and allows re-use across different workflows. --- ### Process Modifiers - **Process modifiers** let you conditionally alter your configuration for special scenarios (e.g., GPU-enabled, phase-2 geometry, fast timing). - Declared in Python, for instance: ```python from Configuration.Eras.Modifier_phase2_common_cff import phase2_common phase2_common.toModify(process.myProducer, threshold=1.0) ``` This sets `process.myProducer.threshold = 1.0` **only** when `phase2_common` is active (i.e., using `--era Phase2C` in `cmsDriver`). --- ### Customisation Functions - A **customisation function** is a Python function that takes `process` as input, modifies it, then returns it. - **Example**: ```python def customiseForSpecialCase(process): process.myProducer.threshold = 2.0 return process ``` - You can apply it via `cmsDriver.py`: ```bash cmsDriver.py step3 ... --customise MyPackage.MySubPackage.customiseForSpecialCase ``` so that `customiseForSpecialCase` is automatically called on the final `process`. --- ## 7. Generating and Running Configurations ### 7.1 cmsDriver.py - Used to generate comprehensive configuration files for simulation, digitization, reconstruction, etc. - **Example**: ```bash cmsDriver.py step3 \ --conditions auto:phase2_realistic \ --eventcontent AODSIM \ --datatier AODSIM \ --step RAW2DIGI,RECO \ --filein file:step2.root \ --fileout file:step3.root \ -n 100 \ --nThreads 8 \ --no_exec ``` - Produces `step3_RAW2DIGI_RECO.py`. You can run: ```bash cmsRun step3_RAW2DIGI_RECO.py ``` ### 7.2 runTheMatrix.py - Script that automates many standard workflows. - **`-l`** specifies which workflow ID(s) to run, **`--nEvents`** sets number of events, , **`-n`** lists the workflows, **`-w`** specifies the set of workflows etc. - Example: ```bash runTheMatrix.py -w upgrade -n # list all upgrade workflows runTheMatrix.py -w upgrade -l 20893.0 -j 0 ``` This dumps configurations (`step1.py`, `step2.py`, etc.) and can run them (`-j` is the number of processes. `-j 0` will use 4 processes, not execute anything but create the wfs). ### 7.3 cmsRun command - **`cmsRun`** executes a CMSSW `.py` configuration file: ```bash cmsRun step3.py ``` - The `.py` file often contains more modules than actually needed. At runtime: 1. the framework builds all the modules in the python file 2. the constructors declare the dependencies (`consumes`, `produces`) 3. the framework builds the Directed Acyclic Graph (DAG) of the modules to run 4. The event loop starts and executes only the modules needed. Eventually additional filters can decide whether or not to run the remaining modules. - If you want to see a summary of which modules actually ran, put: ```python process.options.wantSummary = True ``` in the `step*.py` to print CPU usage, how many times each module was run, etc. ### 7.4 MessageLogger - At runtime, the level of verbosity of the simulation code can be controlled through a general service of the framework called `MessageLogger`. Different levels of verbosity (DEBUG, INFO, WARNING, ERROR) can be selected by the user, as well as output streams to which to send different messages (cout, cerr, external files of user's choice). - To configure: ```python process.load("FWCore.MessageLogger.MessageLogger_cfi") process.MessageLogger = cms.Service("MessageLogger", cout = cms.untracked.PSet( default = cms.untracked.PSet( limit = cms.untracked.int32(0) ## kill all messages in the log ), SimG4CoreApplication = cms.untracked.PSet( limit = cms.untracked.int32(-1) ## but SimG4CoreApplication category - those unlimited ) ), categories = cms.untracked.vstring('SimG4CoreApplication'), destinations = cms.untracked.vstring('cout') ) ``` ### 7.5 Skipping Events - If you want to skip the first `N` events in your input file: ```python process.source.skipEvents = cms.untracked.uint32(N) ``` --- ## 8. Git Workflow in CMSSW ### 8.1 Checking Out and Managing Packages - **Initialize** your local repository (in the `src/` directory): ```bash git cms-init ``` - **Add** a package: ```bash git cms-addpkg RecoLocalTracker/SiPixelRecHits ``` - **Check out** all modified packages from your working release: ```bash git diff --name-only $CMSSW_VERSION | cut -d/ -f-2 | sort -u | xargs -r git cms-addpkg ``` - **Check dependencies**: ```bash git cms-checkdeps -p # python modules and their dependencies git cms-checkdeps -h # header file and their dependencies git cms-checkdeps -b # BuildFile files and their dependencies git cms-checkdeps -A # -h, -p and -b together git cms-checkdeps -a # checkout packages automatically ``` ### 8.2 Merging, Rebasing, and Other Common Tasks - **Push changes** to your fork/branch: ```bash git push my-cmssw my_branch ``` - **Merge** a topic branch from another user: ```bash git cms-merge-topic username:branch ``` - **Rebase** onto another branch: ```bash git cms-rebase-topic username:branch ``` - **Add remote** repository: ```bash git cms-remote add cms-patatrack git fetch cms-patatrack ``` - **Merge** a branch from that remote: ```bash git merge cms-patatrack/Alpaka_updates_13.0.x ``` --- ## 9. Useful Commands 1. **edmDumpEventContent**: Lists all collections in a ROOT file: ```bash edmDumpEventContent step3.root ``` 2. **edmPythonConfigToCppValidation**: Helps generate C++ `fillDescriptions()` from Python config: ```bash edmPythonConfigToCppValidation file.py ``` 3. **scram b runtests**: Runs tests in all checked-out packages. ```bash scram b runtests grep -h '^---> test' $CMSSW_BASE/tmp/$SCRAM_ARCH/src/*/*/test/*/testing.log | \ grep 'had ERRORS\|' ``` --- ## 10. Examples and References 1. **Minimal config**: ```python import FWCore.ParameterSet.Config as cms process = cms.Process("MYPROC") process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring("file:input.root") ) process.myProducer = cms.EDProducer("MyProducer", threshold = cms.double(0.7), inputTag = cms.InputTag("someOtherProducer") ) process.myAnalyzer = cms.EDAnalyzer("MyAnalyzer") process.out = cms.OutputModule("PoolOutputModule", fileName = cms.untracked.string("output.root") ) process.myPath = cms.Path(process.myProducer * process.myAnalyzer) process.outPath = cms.EndPath(process.out) process.options.wantSummary = True # run it: # cmsRun myConfig.py ``` 2. **Official Documentation**: - [CMSSW WorkBook Chapter 2.3 (“CMSSW Application Framework”)](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCMSSWFramework) - [More on the CMS framework](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMoreOnCMSSWFramework) - [Configuration File Documentation](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookConfigFileIntro) **Final Tips**: - Always do `cmsenv` after entering a CMSSW `src/` directory so that environment variables are set correctly. - The CMSSW framework automatically constructs the processing graph based on module dependencies — only modules that produce or consume relevant data will run (plus any forced by Paths). - Use `process.options.wantSummary = True` to see the runtime summary. - The “software bus” concept (the `edm::Event`) ensures modular development: each module reads only what it needs and can produce new data for subsequent modules. --- **Happy coding in CMSSW!** *Original Author: Felice Pantaleo CERN (felice.pantaleo@cern.ch) - January 2025*