SK4ALL - Moore lesson

# SK4ALL - Moore lesson Moore with Nicole :) ## Run 3 Computing Model (6) For reference - [Computing TDR](https://cds.cern.ch/record/2319756/files/LHCB-TDR-018.pdf) The description of the current Run-3 analysis model - [Slides](https://docs.google.com/presentation/d/1JnuHHxDXS9FfGfd4j8AT5YxuJTsLcHhRNxwUzO3PTYU/edit?usp=sharing) >>> Slide 1-4 ## The Moore application (4) [Moore](https://gitlab.cern.ch/lhcb/Moore) is the software project that runs the HLT2 and Sprucing steps in the LHCb data processing chain. Moore is built upon [Gaudi](https://gitlab.cern.ch/lhcb/Gaudi) - the HEP event data processing framework. Moore is part of the LHCb software stack that is built every night in the [nightlies](https://lhcb-nightlies.web.cern.ch/nightly/#lhcb-2025-patches) and [released periodically](https://gitlab.cern.ch/lhcb-core/lhcbstacks/-/merge_requests/335) to `cvmfs` The software stack looks like this: > Gaudi >> Detector >>> LHCb >>>> Lbcom >>>>> Rec >>>>>> Allen >>>>>>> Moore >>>>>>>> DaVinci Online (HLT2) and Offline (Sprucing) selections share the same codebase. This is deliberate and a change from Runs 1&2 * The same algorithms and tools are shared between HLT2 and Sprucing (and DaVinci) - ThOr based selection and combinatorial algorithms * HLT2 and Sprucing selection lines are identical and trivially interchangeable Advantages: consistency between online and offline selections/builders, one codebase rather than two, more developers... --> HLT2 and Sprucing jobs look very similar! ## PyConf (10) PyConf makes Gaudi application configurations safer, cleaner, and simpler to debug. The [PyConf](https://gitlab.cern.ch/lhcb/LHCb/-/blob/master/PyConf/python/PyConf) code lives in the LHCb project. Aim of PyConf: Separate the control flow and the data flow in Gaudi applications * Data flow - the path that data takes through a system, from input to output * Control flow - the order in which individual functions are executed **Functional code framework** - "organise and write code that prioritizes functions as the fundamental building blocks" PyConf achieves this with "wrappers" and "configurable" decorators (binds) ### Wrappers There are three core components of a control flow: 1. Algorithm 2. Tool 3. DataHandle (the inputs and outputs of Algorithms and Tools) PyConf provides wrappers around existing algorithms (eg. `PrKalmanFilter`) and tools (eg. `BackgroundCategory`) that transforms them into control flow "building blocks". These blocks are multithreading friendly. To use PyConf, we wrap our algorthim/tool by importing it from `PyConf.Algorithms` or `PyConf.Tools` rather than `Configurables`. An example is the algorithm [HltRoutingBitsFilter](https://gitlab.cern.ch/lhcb/LHCb/-/blob/2025-patches/Hlt/HltDAQ/src/component/HltRoutingBitsFilter.cpp) ```python >>> from Configurables import HltRoutingBitsFilter as HltRoutingBitsFilter_fromConfigurables >>> from PyConf.Algorithms import HltRoutingBitsFilter as HltRoutingBitsFilter_fromPyConf ``` ``` >>> HltRoutingBitsFilter_fromConfigurables <class 'HltDAQ.HltDAQConf.HltRoutingBitsFilter'> ``` ``` >>> HltRoutingBitsFilter_fromPyConf <FunctionWrapper at 0x7f75c1d47460 for function at 0x7f75c2ada020> ``` ``` >>> HltRoutingBitsFilter_fromPyConf.type <class 'HltDAQ.HltDAQConf.HltRoutingBitsFilter'> ``` ``` >>> HltRoutingBitsFilter_fromPyConf.<tab> ``` In a script the algorthim would be declared like the following ```python from PyConf.Algorithms import HltRoutingBitsFilter rb_bank = default_raw_banks("HltRoutingBits") # datahandle to rawbank rb_filter = [ HltRoutingBitsFilter( name="PhysFilter", RawBanks=rb_bank, # input is rawbank datahandle RequireMask=physFilterRequireMask, PassOnError=False)] ``` ### Nodes The control flow of an application is built by linking instances of `PyConf.control_flow.CompositeNode`. A `CompositeNode` runs `children` algorithms - these can be algorithms like the filter above, or another `CompositeNode`. Each node has a particular logic, defined by `PyConf.control_flow.NodeLogic`. A simple example of a Moore job control flow is: ``` MooreNode (LAZY_AND) *-- HLTLinesNode (NONLAZY_OR) | +-- Hlt2CharmPhysicsLineNode (LAZY_AND) | | *-- PVFilter | | *-- D2HHHCombiner | +-- Hlt2DiMuonPhysicsLineNode (LAZY_AND) | | *-- PVFilter | | *-- MuMuCombiner | +-- Hlt2LumiLineNode (LAZY_AND) | | *-- ODINBeamFilter | | *-- LumiCounter | +-- Hlt2InclusiveBPhysicsLineNode (LAZY_AND) | *-- PVFilter | *-- TwoBodyBCombiner *-- PersistencyNode (LAZY_AND) *-- DecReports *-- TurboWriter ``` The application data flow is automatically deduced by the *scheduler* using the datahandles to determine dependencies * The user only needs to define the *control flow* like above * The scheduler is responsible for scheduling and executing algorithms in an order that meet dependencies Original PyConf [paper](https://inspirehep.net/files/c12010a20c7a2eb666b3a3841f2e79c7) ### Configurable and Binds (6) The `configurable` decorator within PyConf lets you control the parameters of a function from a top level file using `bind` (scoped) and `global_bind` (unscoped). This prevents us having to pass changes to arguments in functions all the way down a call stack. A very simple example of how bind works ```bash lb-run Moore/v57r6 python ``` ```python >>> from PyConf import configurable >>> @configurable ... def print_two(value=2): ... print(value) >>> print_two() 2 >>> with print_two.bind(value=3): ... print_two() ... 3 >>> print_two() 2 >>> print_two.global_bind(value=7) >>> print_two() 7 >>> print_two() 7 ``` An example use-case for a builder is [here](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/python/Hlt2Conf/lines/b_to_charmonia/builders/basic_builder.py#L97) We will see how `binds` control the reconstruction etc. later ## Functor Caches (6) Moore uses Just-in-time (JIT) compilation for functors (`F.math.in_range`, `F.MASS`, `F.PID_K` etc.) * Functors are compiled during the execution of the job, rather than in a seperate step before ie. you dont have to run `make` everytime you change your functors * In production the configuration of functors for a HLT2/Sprucing job is frozen * Use of JIT slows down the application initialisation as the required functors need to be compiled first * It costs a lot of memory - for offline productions we have a 2GB limit per job at sites * If you have run Analysis Productions you may have come across the memory limit pipeline. Exceeding this limit is generally due to the functor compilation in DaVinci So in productions we use a functor cache, that we pre-store on `cvmfs`, so that it can be accessed when running Moore in production, for instance on WLCG resources. * Functor caches are shared object libraries created during the build step of Moore for a [set of options](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/MooreCache/CMakeLists.txt?ref_type=heads#L50) that we run in production * These caches are released with the stack and so are availiable on `cvmfs` ```bash bash-5.1$ ls /cvmfs/lhcb.cern.ch/lib/lhcb/MOORE/MOORE_v57r6/InstallArea/x86_64_v2-el9-gcc13-opt/lib/ cmake/ libMoore_FunctorCache_Hlt2_options_hlt2_pp_thor.so libMoore_FunctorCache_Hlt1_hlt1_pp_default.so libMoore_FunctorCache_Hlt2_options_sprucing_spruce_production.so libMoore_FunctorCache_Hlt2_options_hlt2_pp_2025.so Moore.components ``` When you run a new configuration locally a local cache is created to speed up repeated runs of the same job as we will see later. (As we cannot release a functor cache for every Analysis Production - there are 1000s - to `cvmfs`, DaVinci jobs remain memory intensive, particularly if many, many functors are used in a job. This is being worked on though :) ) [More on functors](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/doc/selection/thor_functors.rst?ref_type=heads#id7) ## Decoding Keys and Tables (6) The LHCb Online system produces and exports data in the MDF file format. These files contain RawEvents consisting of RawBanks. * “DstData” RawBank - your physics candidate objects * "HltDecReports” RawBank - HLT{1,2} and Sprucing selection line decisions * Detector RawBank - eg. CALO, MUON that store the sub-detector response for an event Locations of physics objects and selection line names can be very long eg. "/Event/Spruce/SpruceSLB_LbToLcTauNu_LcTopKPi_TauToPiPiPiNu/Particles" and "Hlt2TrackEff_TurCalVelo2Long_Kshort" To save disk space, this data is encoded. Instead of saving the long strings, a map is created between the strings and integer values for * Physics object locations (PackedObjectLocations) * Selection line decision names (HLT1SelectionID, HLT2SelectionID, SpruceSelectionID) In the Rawbanks, only the integers and corresponding information are stored along with a unique hexadecimal key that represents the integer to string map. In order to read the data, the map - or decoding table - is necessary. Each decoding table, uniquely identified by its hexadecimal key, must be pushed to the GitLab repo [file-content-metadata](https://gitlab.cern.ch/lhcb-conddb/file-content-metadata) such that it is preserved for reading the files later. The Sprucing outputs ROOT DST files, which again consist of RawEvents consisting of RawBanks, but they also contain a File Summary Record (FSR). To make the Sprucing output files self contained, the decoding tables are written to the FSR and can be read from there in further processing steps. ## Running a HLT2 example (20) Now we have gone over PyConf wrappers and binds, functor caches and decoding keys we will run an example HLT2 job. Clone down the snippets we will need into a directory `SK4ALL/` ``` git clone ssh://git@gitlab.cern.ch:7999/snippets/3539.git SK4ALL ``` ``` git clone https://gitlab.cern.ch/snippets/3539.git SK4ALL ``` ``` bash-5.1$ ls SK4ALL/ hlt2_lbexec.py hlt2.yaml interactive-dst.py spruce_lbexec.py spruce.yaml ``` Instructions to run the scripts are in the doc-strings. ### HLT2 script : `hlt2_lbexec.py` This script defines two HLT2 lines * Hlt2Test_D0ToKpPim * Hlt2Test_D0ToKpKm with `persistreco=True` and adds them to a stream called `default` that is handed to the `Streams` object. You will see the builder `make_dzero_kpi` has a `@configurable` decorator. There are many binds required for HLT2, mainly to control reconstruction attributes deep in the call stack. The bind you can see explicitely ``` with reconstruction.bind(from_file=False): ``` is key and tells Moore that it needs to run the reconstruction itself rather than take it `from_file`. There are lots of other binds consolidated in [config_pp_2024](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/python/Hlt2Conf/settings/hlt2_binds.py#L76) that we import in this script ``` from Hlt2Conf.settings.hlt2_binds import config_pp_2024 with config_pp_2024(): ``` On the first run of this HLT2 job... **The decoding key** is created and stored in the local clone of the `lhcb-metainfo` repo. It tells us how to push this to GitLab - Dont do that for now :) ``` # WARNING: Created new key 911c9121 - to publish the corresponding decoding table, please do `git -C /home/nskidmor/SKtest/SK4ALL/lhcb-metainfo/.git push origin key-911c9121` # WARNING: Created new key b46c692f - to publish the corresponding decoding table, please do `git -C /home/nskidmor/SKtest/SK4ALL/lhcb-metainfo/.git push origin key-b46c692f` ``` We can inspect the created key by running the following inside the `lhcb-metainfo` directory that has been created ```bash bash-5.1$ git log commit b156bf0f5134665a0f0faa1b1ff0dd1974f25844 (HEAD -> master, key-b46c692f) Author: gitlabCI <gitlab-ci@cern.ch> Date: Thu Jun 12 19:06:07 2025 +0200 automated commit to define key b46c692f ``` ```bash bash-5.1$ git show b156bf0f5134665a0f0faa1b1ff0dd1974f25844 commit b156bf0f5134665a0f0faa1b1ff0dd1974f25844 (HEAD -> master, key-b46c692f) Author: gitlabCI <gitlab-ci@cern.ch> Date: Thu Jun 12 19:06:07 2025 +0200 automated commit to define key b46c692f diff --git a/ann/json/b4/b46c692f.json b/ann/json/b4/b46c692f.json new file mode 100644 index 0000000..b46c692 --- /dev/null +++ b/ann/json/b4/b46c692f.json @@ -0,0 +1 @@ +{"PackedObjectLocations": {"1": "/Event/HLT2/Hlt2Test_D0ToKpKm/Particles", "2": "/Event/HLT2/Hlt2Test_D0ToKpPim/Particles", "3": "/Event/HLT2/Rec/Calo/Electrons", "4": "/Event/HLT2/Rec/Calo/MergedPi0s", "5": "/Event/HLT2/Rec/Calo/Photons", "6": "/Event/HLT2/Rec/Calo/SplitPhotons", "7": "/Event/HLT2/Rec/ProtoP/Downstream", "8": "/Event/HLT2/Rec/ProtoP/Long", "9": "/Event/HLT2/Rec/ProtoP/Neutrals", "10": "/Event/HLT2/Rec/ProtoP/Upstream", "11": "/Event/HLT2/Rec/Summary", "12": "/Event/HLT2/Rec/Track/BestDownstream", "13": "/Event/HLT2/Rec/Track/BestLong", "14": "/Event/HLT2/Rec/Track/BestUpstream", "15": "/Event/HLT2/Rec/Track/Ttrack", "16": "/Event/HLT2/Rec/Track/Velo", "17": "/Event/HLT2/Rec/Vertex/Primary"}, "version": "0"} \ No newline at end of file ``` **A local functor cache** is also created ``` FunctorFactory INFO New functor library will be created: "/tmp/nskidmor/FunctorJitLib_0xdfc9c368b5e66660_0xfad471980579d547.so" FunctorFactory INFO Writing cpp files for 1 functors split in 1 files FunctorFactory INFO Compilation will use 1 jobs. FunctorFactory INFO Compilation of functor library took 24 seconds ``` **In the job log** the `HLTControlFlowMgr` is reported. It shows the nodes and algorithms in the control flow, how many times they have been run and how "efficient" they were. The `HLTControlFlowMgr` is a good way to debug jobs ``` HLTControlFlowMgr INFO StateTree: CFNode #executed #passed LAZY_AND: moore #=1000 Sum=868 Eff=|( 86.80000 +- 1.07040 )%| NONLAZY_OR: lines #=1000 Sum=868 Eff=|( 86.80000 +- 1.07040 )%| NONLAZY_OR: hlt_decision #=1000 Sum=868 Eff=|( 86.80000 +- 1.07040 )%| LAZY_AND: Hlt2Test_D0ToKpKmDecisionWithOutput #=1000 Sum=644 Eff=|( 64.40000 +- 1.51415 )%| LAZY_AND: Hlt2Test_D0ToKpKm #=1000 Sum=644 Eff=|( 64.40000 +- 1.51415 )%| DeterministicPrescaler/Hlt2Test_D0ToKpKm_Prescaler #=1000 Sum=1000 Eff=|( 100.0000 +- 0.00000 )%| VoidFilter/Default_Hlt1Filter #=1000 Sum=958 Eff=|( 95.80000 +- 0.634319)%| VoidFilter/require_pvs #=958 Sum=941 Eff=|( 98.22547 +- 0.426551)%| ParticleRangeFilter/Hlt2Test_D0ToKp_ParticleRangeFilter #=941 Sum=940 Eff=|( 99.89373 +- 0.106213)%| TwoBodyCombiner/Hlt2Test_D0ToKpKm_D0ToKpKm_Builder #=940 Sum=644 Eff=|( 68.51064 +- 1.51495 )%| Monitor__ParticleRange/Monitor__Hlt2Test_D0ToKpKm #=644 Sum=644 Eff=|( 100.0000 +- 0.00000 )%| Monitor__Global/GlobalMonitor__Hlt2Test_D0ToKpKm #=644 Sum=644 Eff=|( 100.0000 +- 0.00000 )%| NONLAZY_OR: Hlt2Test_D0ToKpKmOutput #=644 Sum=644 Eff=|( 100.0000 +- 0.00000 )%| CopyParticles/Copy_Event_Hlt2Test_D0ToKpKm_Particles #=644 Sum=644 Eff=|( 100.0000 +- 0.00000 )%| ... RecSummaryMaker/Hlt2Test_D0ToKp_RecSummaryMaker #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| LAZY_AND: Hlt2Test_D0ToKpPimDecisionWithOutput #=1000 Sum=854 Eff=|( 85.40000 +- 1.11662 )%| LAZY_AND: Hlt2Test_D0ToKpPim #=1000 Sum=854 Eff=|( 85.40000 +- 1.11662 )%| DeterministicPrescaler/Hlt2Test_D0ToKpPim_Prescaler #=1000 Sum=1000 Eff=|( 100.0000 +- 0.00000 )%| VoidFilter/Default_Hlt1Filter #=1000 Sum=958 Eff=|( 95.80000 +- 0.634319)%| VoidFilter/require_pvs #=958 Sum=941 Eff=|( 98.22547 +- 0.426551)%| ParticleRangeFilter/Hlt2Test_D0ToKp_ParticleRangeFilter #=941 Sum=940 Eff=|( 99.89373 +- 0.106213)%| TwoBodyCombiner/Hlt2Test_D0ToKpPim_D0ToKpPim_Builder #=940 Sum=854 Eff=|( 90.85106 +- 0.940343)%| Monitor__ParticleRange/Monitor__Hlt2Test_D0ToKpPim #=854 Sum=854 Eff=|( 100.0000 +- 0.00000 )%| Monitor__Global/GlobalMonitor__Hlt2Test_D0ToKpPim NONLAZY_OR: Hlt2Test_D0ToKpPimOutput #=854 Sum=854 Eff=|( 100.0000 +- 0.00000 )%| CopyParticles/Copy_Event_Hlt2Test_D0ToKpPim_Particles #=854 Sum=854 Eff=|( 100.0000 +- 0.00000 )%| ... RecSummaryMaker/Hlt2Test_D0ToKp_RecSummaryMaker #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| LAZY_AND: monitor_decisions #=1000 Sum=0 Eff=|( 0.000000 +- 0.00000 )%| DeterministicPrescaler/HLT2PrescaleDecReportsMonitor #=1000 Sum=100 Eff=|( 10.00000 +- 0.948683)%| HltDecReportsMonitor/HLT2DecReportsMonitor #=100 Sum=100 Eff=|( 100.0000 +- 0.00000 )%| DeterministicPrescaler/HLT2PostscaleDecReportsMonitor #=100 Sum=0 Eff=|( 0.000000 +- 0.00000 )%| NONLAZY_OR: report_writers #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| ExecutionReportsWriter/ExecutionReportsWriter_803a91cb #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| HltDecReportsWriter/HltDecReportsWriter_ec677368 #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| NONLAZY_OR: stream_writers #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| LAZY_AND: default_writer #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| HltDecReportsFilter/HltDecReportsFilter_2b19926f #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| NONLAZY_OR: default_line_output_persistence #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| HltPackedBufferWriter/HltPackedBufferWriter_dde0ef5f #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| SelectiveCombineRawBankViewsToRawEvent/SelectiveCombineRawBanks_for_default #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| LHCb::MDFWriter/LHCb__MDFWriter_eeee5b76 #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| ``` Now lets try using bind on `make_dzero_kpi`... Apply the bind `make_dzero_kpi.bind(pid_k=10, pid_pi=0)` within the `with` context and re-run. **The encoding keys** already exist locally so in subsequent runs of the job we see ``` HltANNSvc WARNING key 0xb46c692f has an explicitly configured overrule -- using that... ``` **The functor cache is not rebuilt** - although we have changed a functor "cut value" this does not require a new functor cache and the existing, local one is used. ``` FunctorFactory INFO Reusing functor library: "/tmp/nskidmor/FunctorJitLib_0xdfc9c368b5e66660_0xfad471980579d547.so" ``` Looking now at the `HLTControlFlowMgr` we can see the effect of the changed, tighter PID cuts ``` LAZY_AND: Hlt2Test_D0ToKpPimDecisionWithOutput #=1000 Sum=795 Eff=|( 79.50000 +- 1.27662 )%| LAZY_AND: Hlt2Test_D0ToKpPim #=1000 Sum=795 Eff=|( 79.50000 +- 1.27662 )%| DeterministicPrescaler/Hlt2Test_D0ToKpPim_Prescaler #=1000 Sum=1000 Eff=|( 100.0000 +- 0.00000 )%| VoidFilter/Default_Hlt1Filter #=1000 Sum=958 Eff=|( 95.80000 +- 0.634319)%| VoidFilter/require_pvs #=958 Sum=941 Eff=|( 98.22547 +- 0.426551)%| ParticleRangeFilter/Hlt2Test_D0ToKpPim_ParticleRangeFilter_2 #=941 Sum=940 Eff=|( 99.89373 +- 0.106213)%| TwoBodyCombiner/Hlt2Test_D0ToKpPim_D0ToKpPim_Builder #=940 Sum=795 Eff=|( 84.57447 +- 1.17808 )%| Monitor__ParticleRange/Monitor__Hlt2Test_D0ToKpPim #=795 Sum=795 Eff=|( 100.0000 +- 0.00000 )%| ``` ## Explore TES (5) We can explore the produced file interactively. If you used Bender in Run 1&2 you will know about inspecting the TES. We use [interactive-dst](https://gitlab.cern.ch/lhcb/Moore/-/blob/master/Hlt/Moore/tests/options/starterkit/first-analysis-steps/interactive-dst.py) with `--input_process Hlt2` ```bash lb-run Moore/v57r6 python -i interactive-dst.py --input_process Hlt2 --input_file hlt2test..mdf ``` You can see listed all the physics object locations: reconstruction objects, your lines' `Particles`, etc. We can inspect the HLT2 decreports ``` >>> evt['/Event/Hlt2/DecReports'] ``` Or using the helper function: ``` list_fired_triggers() ``` If the event you are looking at fired a selection line you can look at the particles of it ``` evt['/Event/HLT2/Hlt2Test_D0ToKpPim/Particles'][0] evt['/Event/HLT2/Hlt2Test_D0ToKpPim/Particles'][0].daughters()[1].proto() evt['/Event/HLT2/Hlt2Test_D0ToKpPim/Particles'][0].daughters()[1].data() evt['/Event/HLT2/Hlt2Test_D0ToKpPim/Particles'][0].daughters()[1].proto().track() ``` **EXERCISE FOR LATER:** Try calling `.pt()`, `.proto()`, `.proto().track()` on the `Particle` objects ## Gaudirun vs. lbexec (3) The HLT2 script you ran is written for lbexec rather than gaudirun. Lbexec is designed to unify the way we run LHCb applications. There are 2 object types: `FunctionLoader` and `OptionsLoader`, that we pass to lbexec ```python FunctionLoader ------------------ Wrapper class which takes a function spec of the form ``module.name:callable``. """ parser.add_argument( "function", type=FunctionLoader, help="Function to call with the options that will return the configuration. " "Given in the form 'my_module:function_name'.", ) """ OptionsLoader ------------------ Converts a '+' separated list of YAML file paths into an ``Application.Options`` object. """ parser.add_argument( "options", help="YAML data to populate the Application.Options object with. " "Multiple files can merged using 'file1.yaml+file2.yaml'.", ) """ ``` Behind the scenes `lbexec` hands these to Gaudi. **EXERCISE FOR LATER:** The HLT2 script we used above is written for execution with lbexec. By looking at a gaudirun example [here](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/options/hlt2_2or3bodytopo_realtime.py?ref_type=heads) try converting this for execution with gaudirun.py and test it ## Running a Sprucing example (12) We will now run a Sprucing job over our HLT2 output. We will decompose the Sprucing control flow, writing it explicitely in the script (rather than using functions from [config.py](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Moore/python/Moore/config.py)) to learn more about `CompositeNodes`. The main components of a HLT2/Sprucing job are: * Event loop - provided by Gaudi * Filters (on routing bits/selection line decisions) * The `Streams` object containing selection lines * Report writers (HLT and Sprucing decreports) * Stream writers ### Sprucing script: spruce_lbexec.py This script re-defines the two HLT2 lines we ran above as `SpruceLine`s - you can see how easily `Hlt2Line` and `SpruceLine` objects are interchanged. We have simplified and decomposed the control flow to * One `CompositeNode` with `NONLAZY_OR` logic that runs the two Sprucing lines called `decisions_node` * One `CompositeNode` with `LAZY_AND` logic that runs a routing bit filter `rb_filter` and `decisions_node` called `moore_control_node` where the order is important i.e. `force_order=True` You can see the important bind `with reconstruction.bind(from_file=True, spruce=True)` Note that in this script we just give Moore a bunch of nodes containing `PyConf.Algorithms`. The scheduler will determine the data flow for the job using the input/output datahandles of each node. Taking a look at the `HLTControlFlowMgr` we can see * The effect of the filter (remember all events in the HLT2 output passed one of our HLT2 lines. This will change when we add a lumi line) * The number of events is what we expect (the SpruceLines were identical to the HLT2Lines) * There is less algorithms (we do not run the reconstruction) ``` HLTControlFlowMgr INFO StateTree: CFNode #executed #passed LAZY_AND: physics_sprucing_node #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| HltRoutingBitsFilter/PhysFilter #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| NONLAZY_OR: spruce_decision #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| LAZY_AND: SpruceTest_D0ToKpPimDecisionWithOutput #=868 Sum=854 Eff=|( 98.38710 +- 0.427576)%| LAZY_AND: SpruceTest_D0ToKpPim #=868 Sum=854 Eff=|( 98.38710 +- 0.427576)%| DeterministicPrescaler/SpruceTest_D0ToKpPim_Prescaler #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| VoidFilter/require_pvs #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| TwoBodyCombiner/D0ToKpPim_Builder_63fa41be #=868 Sum=854 Eff=|( 98.38710 +- 0.427576)%| Monitor__ParticleRange/Monitor__SpruceTest_D0ToKpPim #=854 Sum=854 Eff=|( 100.0000 +- 0.00000 )%| Monitor__Global/GlobalMonitor__SpruceTest_D0ToKpPim #=854 Sum=854 Eff=|( 100.0000 +- 0.00000 )%| NONLAZY_OR: SpruceTest_D0ToKpPimOutput #=854 Sum=854 Eff=|( 100.0000 +- 0.00000 )%| CopyParticles/Copy_Event_SpruceTest_D0ToKpPim_Particles #=854 Sum=854 Eff=|( 100.0000 +- 0.00000 )%| RecVertexUnpacker/Unpack_Event_HLT2_Rec_Vertex_Primary #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| RecSummaryUnpacker/Unpack_Event_HLT2_Rec_Summary #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| LAZY_AND: SpruceTest_D0ToKpKmDecisionWithOutput #=868 Sum=644 Eff=|( 74.19355 +- 1.48521 )%| LAZY_AND: SpruceTest_D0ToKpKm #=868 Sum=644 Eff=|( 74.19355 +- 1.48521 )%| DeterministicPrescaler/SpruceTest_D0ToKpKm_Prescaler #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| VoidFilter/require_pvs #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| TwoBodyCombiner/D0ToKpKm_Builder_5f050f02 #=868 Sum=644 Eff=|( 74.19355 +- 1.48521 )%| Monitor__ParticleRange/Monitor__SpruceTest_D0ToKpKm #=644 Sum=644 Eff=|( 100.0000 +- 0.00000 )%| Monitor__Global/GlobalMonitor__SpruceTest_D0ToKpKm #=644 Sum=644 Eff=|( 100.0000 +- 0.00000 )%| NONLAZY_OR: SpruceTest_D0ToKpKmOutput #=644 Sum=644 Eff=|( 100.0000 +- 0.00000 )%| CopyParticles/Copy_Event_SpruceTest_D0ToKpKm_Particles #=644 Sum=644 Eff=|( 100.0000 +- 0.00000 )%| RecVertexUnpacker/Unpack_Event_HLT2_Rec_Vertex_Primary #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| RecSummaryUnpacker/Unpack_Event_HLT2_Rec_Summary #=868 Sum=868 Eff=|( 100.0000 +- 0.00000 )%| ``` ### Binds in Sprucing In the above example we still have to worry about binds. But Sprucing was designed to run using lbexec and takes advantage of the built in [`apply_binds`](https://gitlab.cern.ch/lhcb/Moore/-/blob/master/Hlt/Moore/python/Moore/LbExec.py#L41) functionality. It binds based on the job `options` ```python if self.input_process: reconstruction_reading.global_bind(input_process=self.input_process) tes_root.global_bind(input_process=self.input_process) if ( self.process == ProcessTypes.Spruce or self.process == ProcessTypes.TurboSpruce or self.process == ProcessTypes.TurboPass ): reconstruction.global_bind(spruce=True, from_file=True) ``` So for normal Sprucing jobs, where we give `input_process` and `process` we do not need to declare binds in our scripts. We will see this in the exercises. **Exercise for later:** Set the `process` option in your yaml, remove all binds from the script and verify the job behaves the same ## Inspecting an FSR (10) Sprucing output files are self contained meaning the FSR contains all the information we need to read the data back - the decoding keys. We can inspect the FSR of a Sprucing output from production (note the Sprucing script we ran above did not save output - this is an exercise for later) In python in the Moore env ``` >>> import ROOT, json >>> file="/eos/lhcb/wg/dpa/wp1/data/sk4all_fsrexample_00293364_00045502_1.b2oc.dst" >>> f=ROOT.TFile.Open(file) >>> fsr=json.loads(str(f.FileSummaryRecord)) >>> fsr.keys() >>> fsr['ann'].keys() >>> fsr['ann']['0x057934c0'] >>> for file in fsr["inputs"]: ... print(file["LumiCounter.eventsByRun"]) ``` ## Exercises In `spruce_lbexec.py` you have a very simple control flow, a filter and two selection lines run 1. Take a look at the stream setup for [HLT2 in 2025](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/python/Hlt2Conf/settings/hlt2_pp_2025.py). Every stream has a `lumi_nanofy_line` HLT2 line that is used for luminosity determination offline. Add this line to your HLT2 stream in `hlt2_lbexec.py`. Run again and use the log to see how many events are lumi events 2. Sprucing jobs remove the luminosity events, instead counting them and adding the total to the FSR. The Sprucing does this using a filter. Taking inspiration from `PhysFilter` that we used above try adding a `LumiFilter` to `spruce_lbexec.py`. The routing bit "mask" you need is `lumiFilterRequireMask = (0x0, 0x0, 0x40000000)`. Consider the node logic you use 3. There is a script `spruce_lbexec_writeoutput.py` that rewrites the Sprucing script using the normal Moore methods from [config.py](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Moore/python/Moore/config.py) including writing output. There is a Sprucing Line we use for tests called `Test_extraoutputs_sprucing_line`. This line saves the [extra_outputs LongTracks](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/python/Hlt2Conf/lines/test/spruce_test.py#L305). Modify the existing `SpruceTest_D0ToKpKm` line in `spruce_lbexec_writeoutput.py` to also do this and run it 8. Use the [interactive-dst](https://gitlab.cern.ch/lhcb/Moore/-/blob/master/Hlt/Moore/tests/options/starterkit/first-analysis-steps/interactive-dst.py) script to determine how many `LongTracks` objects your line saves for the first event it selects in the file. You will need `--input_process Spruce` and `--input_stream default` 9. Inspect the FSR in your Sprucing output file locating the `PackedObjectLocations` for HLT2 and Sprucing and the `HLT2SelectionID` and `SpruceSelectionID` keys ### Answers For answers to the exercises do ``` git clone ssh://git@gitlab.cern.ch:7999/snippets/3545.git SK4ALL_answers ``` ``` git clone https://gitlab.cern.ch/snippets/3545.git SK4ALL_answers ``` ## Extra ### Inspecting runtime objects The 2025 HLT2 configuration is defined as follows: * The file [options/hlt2_pp_2025.py](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/options/hlt2_pp_2025.py) is the top level options file that you would call `gaudirun.py` on. Its effectively what runs in the pit. * This file calls [settings/hlt2_pp_2025.py](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/python/Hlt2Conf/settings/hlt2_pp_2025.py) that defines the `Streams` object ie. what lines run and which stream they belong to. You can see this invoked [here](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/options/hlt2_pp_2025.py#L39). * The binds are abstracted away from the user by defining them in [settings/hlt2_binds.py](https://gitlab.cern.ch/lhcb/Moore/-/blob/2025-patches/Hlt/Hlt2Conf/python/Hlt2Conf/settings/hlt2_binds.py). What if we want to inspect the `Streams` object for debugging reasons for instance. This is not that straight forward as this only exists at run time once the job is configured. We can use `simple_moore.py` that you can get in the answers folder. With this we can inspect all the streams and lines booked to be run. Instructions are in the doc string. ### Tags `DetDesc` and `DD4HEP` serve the same purpose : to define the full detector description (geometry, materials, conditions, ...). `DD4HEP` is the replacement for `DetDesc`. **For MC:** you need to use a `detdesc` platform, identified by having `detdesc` in the platform name. You can see the platforms avaliable for a Moore version with ```bash [nskidmor@lxplus991 ~]$ ls /cvmfs/lhcb.cern.ch/lib/lhcb/MOORE/MOORE_v57r6/InstallArea/ armv8.1_a-el9-gcc13-dbg x86_64_v2-el9-gcc13-dbg x86_64_v2-el9-gcc13+detdesc-opt x86_64_v3-el9-gcc13+cuda12_4-opt+g x86_64_v3-el9-gcc13-opt+g x86_64_v2-el9-clang16-opt x86_64_v2-el9-gcc13+detdesc-dbg x86_64_v2-el9-gcc13-opt x86_64_v3-el9-gcc13+detdesc-opt+g ``` or take a look at the [nighlty builds](https://lhcb-nightlies.web.cern.ch/nightly/#lhcb-2025-patches) page. You can specify at platform with the `-c` command option ```bash lb-run -c x86_64_v3-el9-gcc13-opt+g Moore/v58r1 lbexec script:job options.yaml ``` You need to define the `options`, eg. ```yaml dddb_tag: "dddb-20180815" conddb_tag: "sim-20180530-vc-md100" ``` Sim11 onwards will use `DD4HEP` **For Data:** You need a platform that is not `DetDesc`. Platforms without `detdesc` in the name use `DD4HEP`. Here you will need the `options`, eg. ```yaml geometry_version: "run3/2024.Q1.2-v00.00" conditions_version: "master" ``` ### Possible pitfalls If you get errors of the form ``` ValueError: No options have been passed, so RootIOAlg cannot be created. Pass options or call create_or_reuse_rootIOAlg directly ``` or ``` TypeError: default_raw_event() missing 2 required positional arguments: 'raw_event_format' and 'maker' ``` its usually because the lines are being evaluated before the complete job has been configured. Have a look where you have ``()`` on the lines.