98 views
# VecGeom restructuring and development plan for 2026 (may span to next year) These items are not in a strict order, and there is a list of prioritized items at the end. --- ## 1. Re-design the global navigation layer **Problem :** *The current navigation system currently provides backend-dependent implementations, with too many flavors to maintain. We have a legacy VNavigator-based system used in VecGeom 1.x, providing virtual interfaces implemented by specialized navigators working for scalar/SIMD back-ends and path navigation states. We have GPU-friendlier navigators (BVH/LoopNavigator) that use different navigation states. The general interfaces provide "contracts" that are in some cases ambiguous with respect to the return values. Due to the deep VecGeom refactoring needed for portability, and the fact that the global navigation is not exposed to Geant4, we can take the opportunity to redesign the global navigation layer.* Detailed description: **Phase 1: Define the new navigation contract (GPU only)** * Specify a single, unambiguous API for locate/step/safety: inputs (point, dir, step_limit, in/out NavStateIndex, in/out boundary_flag, in/out last_exited_id) and outputs {hit_volume_id, step}; no implicit side effects. * Freeze the state to a pure index/tuple (NavStateIndex/Tuple-style): stack of volume indices + depth only. Boundary and last-exited remain separate arguments; no aliases bundling them into the state. * Document semantics: boundary tolerance, meaning of “last exited”, and how boundary flags are set/cleared. **Phase 2: Implement a reference navigator on CPU using the new API** * Adapt BVH/LoopNavigator logic to the new signature and index-only state on CPU first (scalar/SIMD). Keep shape calls as they are today; don’t introduce tag/switch dispatch yet. * Introduce a new navigator entry point (e.g., Navigator2 in a new header). * Add unit/functional tests that validate locate/step/safety and boundary semantics against current behavior on CPU. **Phase 3: Bridge legacy callers** * Provide thin adapters that map the BVHNavigator-style calls onto the new API by packing/unpacking boundary/last-exited into/out of the separate arguments. * Update internal callers incrementally to use the new API; mark legacy interfaces deprecated but functional. **Phase 4: Bring the GPU path onto the new contract (still CUDA-only)** * Reuse the CPU reference navigator logic on the GPU but keep existing shape access; goal is only to align the API/state and semantics across host/device. * Ensure the GPU version consumes the pure index state and separate boundary/last-exited parameters; avoid changing dispatch mechanics yet. **Phase 5: Stabilize and validate** * Run consistency tests across CPU scalar/SIMD and CUDA implementations with the new API/state, locking down boundary and last-exited behavior. **Phase 6: Decide what to keep from the CPU-only VNavigator-based system** * In particular allow a SIMD path to still work for the Hybrid navigator (internal loop acceleration). * `navigation/VSafetyEstimator.h`, `navigation/HybridSafetyEstimator.h`, `management/ABBoxManager.h`, `management/HybridManager2.h`. **Later (after the navigation layer is stable):** * Introduce the tag/switch dispatch and POD geometry for HIP/CUDA parity and to remove device virtuals. * Apply the portability work (HIP build, device API macros) on top of the stabilized navigation API. This sequencing gets the navigation API/semantics cleaned up first, with minimal disruption, and leaves the portability/switch-dispatch work as a follow-up once the new layer is solid. See also [this](https://codimd.web.cern.ch/tX4AP270QduQCD7gCKXWFw#) for complementary information. --- ## 2. Changing the vecgeom::Precision system towards mixed-precision **Problem :** *`vecgeom::Precision` is currently a global typedef controlled by `VECGEOM_SINGLE_PRECISION`. This global switch is not viable: many solids are not robust in single precision, while some are efficient and safe in float. Meanwhile the BVH already has **its own independent precision policy** (`VECGEOM_BVH_SINGLE`), used *only* for AABB tests.* We want to introduce a **fully decoupled mixed-precision architecture**: - **Global API stays in double precision** (`vecgeom::Precision = double`). - **BVH precision remains independent**, controlled *only* by `VECGEOM_BVH_SINGLE`. - **Solids gain a full precision policy with four independent numeric types**: - **StorageReal** — how the unplaced volume parameters are stored internally. - **DistanceReal** — numeric type used for *DistanceToIn/Out* kernels. - **SafetyReal** — numeric type used for *SafetyToIn/Out* kernels. - **InsideReal** — numeric type used for *Inside/Contains*. - BVH, after selecting candidate primitives, calls each solid with **its own precision policy**, not globally `double`. This enables fine-grained optimisation: - Example: Box → all float - Tube → DistanceReal=double, InsideReal=float, SafetyReal=float - Polycone → all double - Others configurable per setup. The navigation front-ends do **not** become templated; precision is localised to BVH internals and solids. ### 2.1 Freeze global Precision and clarify layering 1. **Fix `vecgeom::Precision` to `double`** - In `VecGeom/base/Math.h`: ```cpp using Precision = double; ``` - Remove/deprecate the global `VECGEOM_SINGLE_PRECISION` option. - Keep templated tolerance constants: ```cpp template <typename T> constexpr T kToleranceDist = ...; ``` 2. **Clarify layering** - **User API (navigation, placed volumes)** always uses `double`. - **BVH** uses its own scalar type `BVHReal_t` for AABB tests. - **Solids** use their own three precision types: StorageReal, DistanceReal, SafetyReal, InsideReal. ### 2.2 BVH precision (unchanged) 3. **Respect existing BVH configuration** ```cpp using BVHReal_t = std::conditional_t<VECGEOM_BVH_SINGLE, float, double>; ``` - BVH receives inputs as double. - Converts point/dir to `BVHReal_t` **only for bbox intersection tests**. - BVH data structures use `BVHReal_t`. 4. **BVH → solid precision boundary** - BVH selects candidate volumes using bbox tests. - For each candidate, BVH forwards the *double* point/dir to the solid’s policy-aware wrapper. ```cpp double step = vol->DistanceToInPolicy(point, dir, stepMax); double saf = vol->SafetyToInPolicy(point); bool in = vol->InsidePolicy(point); ``` - Each solid converts double → its own Real types as needed. ### 2.3 Per-solid precision policies (full flexibility) 5. **Define a precision policy struct with three Real types per solid** Example default policy: ```cpp struct DefaultSolidPrecisionPolicy { // Storage precision for unplaced parameters using BoxStorageReal = float; using TubeStorageReal = double; // Distance kernels using BoxDistanceReal = float; using TubeDistanceReal = double; // Safety kernels using BoxSafetyReal = float; using TubeSafetyReal = float; // Inside kernels using BoxInsideReal = float; using TubeInsideReal = float; // ... similarly for Cone, Sphere, Polycone, etc. }; ``` This allows: - Box stored in float, all queries float. - Tube stored in double, Distance in double, Inside/Safety in float. - Polycone stored & computed in double, etc. 6. **CMake configuration for per-solid overrides** For each solid and for each Real type: - `VECGEOM_SOLID_BOX_STORAGE=single/double` - `VECGEOM_SOLID_BOX_DISTANCE=single/double` - `VECGEOM_SOLID_BOX_SAFETY=single/double` - `VECGEOM_SOLID_BOX_INSIDE=single/double` The configuration step generates: ```cpp # SolidPrecisionConfig.h (auto-generated) using BoxStorageReal = float; // from CMake using BoxDistanceReal = float; using BoxSafetyReal = float; using BoxInsideReal = float; using TubeStorageReal = double; using TubeDistanceReal = double; using TubeSafetyReal = float; using TubeInsideReal = float; ``` 7. **Tie unplaced volume implementations to their policy** In each solid: ```cpp using StorageReal = BoxStorageReal; using DistanceReal = BoxDistanceReal; using SafetyReal = BoxSafetyReal; using InsideReal = BoxInsideReal; ``` Parameter storage becomes: ```cpp StorageReal fParams[...]; ``` Kernels remain templated: ```cpp template <typename Real_t> Real_t DistanceToInKernel(...); template <typename Real_t> Real_t SafetyToInKernel(...); template <typename Real_t> bool InsideKernel(...); ``` Policy wrappers: ```cpp double DistanceToInPolicy(Vector3D<double> const &p, Vector3D<double> const &d, double stepMax) const { Vector3D<DistanceReal> pp(p); Vector3D<DistanceReal> dd(d); DistanceReal r = DistanceToInKernel<DistanceReal>(pp, dd, ...); return static_cast<double>(r); } double SafetyToInPolicy(Vector3D<double> const &p) const { Vector3D<SafetyReal> pp(p); SafetyReal s = SafetyToInKernel<SafetyReal>(pp, ...); return static_cast<double>(s); } bool InsidePolicy(Vector3D<double> const &p) const { Vector3D<InsideReal> pp(p); return InsideKernel<InsideReal>(pp); } ``` ### 2.4 Integrating solids policy into BVH 8. **BVH uses policy-aware wrappers** - BVH is unchanged except replacing direct solid calls with: ```cpp vol->DistanceToInPolicy(...) vol->DistanceToOutPolicy(...) vol->SafetyToInPolicy(...) vol->InsidePolicy(...) ``` - BVH does not know the solid precision types. 9. **Navigation front-ends remain double-only** - Public API remains: ```cpp double ComputeStep(Vector3D<double> const&, ...); ``` - All precision specialisation lives **below** the navigation front-ends. ### 2.5 Transition plan 10. **Implementation steps** 1. Freeze `Precision = double`. 2. Introduce SolidPrecisionPolicy infrastructure. 3. Add StorageReal/DistanceReal/SafetyReal/InsideReal to solids (defaults all double). 4. Add policy wrappers and route BVH → solid through them. 5. Enable selective float defaults (Box, Safety, Inside). 6. Add CMake overrides for all solids and Real types. 7. Validate mixed-precision correctness against a full-double build. --- ## 3. Cleanup of leftover vector interfaces and implementations **Problem :** *Although vector interfaces were discontinued at (Un)PlacedVolume level, most kernels are still calling VecCore vector interfaces based on masks, avoiding early returns and hindering performance for scalar paths. We need a deep cleanup down to the kernels.* Top-level interfaces and helpers to consider: * `VecGeom/base/Global.h` defines `VectorBackend` (VcVectorT when VECGEOM_VC) and the `vecCore__MaskedAssignFunc` helper; anything including this can pick up `VectorBackend::Real_v`. * Tessellated/planar utilities hard-code `VectorBackend::Real_v`: `volumes/TessellatedSection.h`, `volumes/TessellatedStruct.h`, `volumes/TessellatedCluster.h`, `volumes/PlanarPolygon.h`, and `volumes/kernel/TessellatedImplementation.h`. Cleaning this up requires re-implementing the acceleration structure, so basically a re-implementation of the full solid, as per [VECGEOM-455](https://its.cern.ch/jira/browse/VECGEOM-455) and [VECGEOM-557](https://its.cern.ch/jira/browse/VECGEOM-557). * Angular helpers that are backend-templated and can be instantiated with vector backends: `volumes/Wedge.h`, `volumes/ThetaCone.h` (and code that uses them, e.g., sphere/phi/th-cone helpers). * Any leftover VectorBackend references via kVecSize/Float_v in these files trigger SIMD instantiation of the solid kernels. Step-by-step cleanup plan (keep kernels templated on Real_t, enforce scalar at high level): 1. Gate scalar types at the top interface: add `static_assert(vecCore::IsScalar<Real_t>::value, ...)` in the templated bridges exposed to users—the `VPlacedVolume/VUnplacedVolume` templated entry points (or in the few templated helper methods such as `UnplacedVolumeImplHelper` dispatch functions) to ensure only scalar `float/double` reach the kernels, while kernels stay `template <class Real_t>`. 2. ~~Collapse the global vector backend alias:~~ ~~in `base/Global.h` retire `VectorBackend` (or alias it to `ScalarBackend`) so downstream code no longer pulls `VectorBackend::Real_v`; this prevents new SIMD instantiations from being selected.~~ We actually will still need `VectorBackend` for the global hosty navigators using it, such as `HybridNavigator2`. 3. Sweep kernels’ callers for Backend template instantiations that explicitly use `VectorBackend` (e.g., tessellated and phi/theta wedge helpers). Change those instantiations to scalar (or remove the vector instantiation path), relying on the kernel’s Real_t template only via scalar types. 4. Clean up leftover vector-only utilities: delete or neuter vector-specific typedefs/macros (`vecCore__MaskedAssignFunc` vector branch) and includes; ensure navigation/build code defaults to scalar backends. 5. Build/test with the vector backend disabled (or aliased to scalar) to catch any lingering SIMD instantiations; remove dead code guarded by `VECGEOM_VC` as it becomes unused. 6. Clean-up vector-aware solid kernels of vector calls (large task). Vector-aware kernels that need to be cleaned-up: * `BoxImplementation.h`: `Mask_v` types in `NormalKernel`, generic contains/inside helpers; masked assigns. * `ConeImplementation.h` and `CoaxialConesImplementation.h`: multiple `Mask_v` helpers (`IsOnRing`, inside kernels) and masked control flow. * `CutTubeImplementation.h` / `TubeImplementation.h`: extensive `Mask_v` usage for contains/inside/distance/safety. * `TrdImplementation.h`, `TrapezoidImplementation.h`, `ParallelepipedImplementation.h`, `GenTrapImplementation.h`: `Mask_v` for inside/contains and distance. * `SphereImplementation.h`, `OrbImplementation.h`, `EllipsoidImplementation.h`, `EllipticalConeImplementation.h`, `EllipticalTubeImplementation.h`, `ParaboloidImplementation.h`, `HypeImplementation.h`: `Mask_v/InsideBool_v` and vector-only branches throughout the geometry queries. * `PolyconeImplementation.h` and `GenericPolyconeImplementation.h`: section inside/outside masks; vectorized phi/section logic. * `PolyhedronImplementation.h`: `Mask_v` in phi wedge tests and distance logic. * `TorusImplementation2.h`: `Mask_v` used in contains/inside/distance. * `SExtruImplementation.h`, `ExtrudedImplementation.h`: `Mask_v` in contains/inside/normal. * `ScaledShapeImplementation.h`: passes `Mask_v` validity flags to child. * `TessellatedImplementation.h`: uses `VectorBackend::Real_v` and `Mask_v` in facet tests. * `TetImplementation.h`: `Mask_v` in inside/normal. * `MultiUnionImplementation.h`: normal uses `Mask_v`. * `GenericKernels.h`: helper functions expose `Mask_v` in utilities (`LineDistanceToLine`, etc.) and the masked assign macro pattern feeding kernels. --- ## 4. Eliminate recursions **Problem :** *Unpredictable-depth recursions are bad for GPU. The compiler needs to allocate the maximum expected stack depth from scarce resources, and often has to be conservative, limiting the maximum number of warps in flight and therefore the device occupancy. There are two main sources of recursions in VecGeom: Locate/Relocate and navigation in Boolean volumes.* **Removal of recursions in Locate/Relocate:** = TO BE DONE = **Removal of recursions in Boolean navigation (ongoing)** * The existing Boolean navigation in VecGeom was based on a binary tree (`BooleanStruct`) representation. It is recursive and not suitable for GPU backends (no recursion, limited stack, no dynamic allocation). * Requirements for the “new” implementation: * Use the `LogicExpression` representation as the single source of truth for Boolean solids, including on the GPU. * Provide `Inside`, `DistanceToIn`, `DistanceToOut`, `SafetyToIn`, and `SafetyToOut` with: * **Exact Boolean semantics** (including negation and surface handling), compatible with the existing CPU/binary-tree behaviour. * **No recursion**, using only small, fixed-size, GPU-friendly stacks. * Room for **short-circuit** evaluation and fewer primitive calls than the naive interval approach. Implementation (high-level idea): 1. `LogicExpression` as a flat Boolean program * A Boolean solid is represented by a `LogicExpression`: a flat array of tokens: * component indices (primitives), * operators `& (AND)`, `| (OR)`, `! (NOT)`, * parentheses `(`, `)` encoded as `lplus`, `lminus`, * optional **jump offsets** for short-circuiting (`A | B` can jump over `B` if `A` is already true, etc.). * This expression is scope-based: each parenthesised subexpression defines a scope with a uniform operator (& or |) inside, which is crucial for efficient evaluation. 2. `Inside`: non-recursive, scope-based evaluation * `Inside` is implemented as an iterative **scope interpreter**: * Uses a small explicit **stack of “scope frames”** (start/end indices, current position, accumulated tri-state, pending operator, pending negation). * For each scope: * Walk tokens from `start` to `end`. * For primitives: call their `Inside` in local coordinates, apply `!` when present, and fold with the current accumulated tri-state using the correct Boolean rules: * `AND`: out dominates, then surface, then in. * `OR`: in dominates, then surface, then out. * For sub-scopes `(...)`: **push a new** frame instead of recursing, evaluate it, and fold its result into the parent. * `!` is handled as a flag applied to the next primitive or sub-scope (only single-element scopes are complemented). * Jump offsets for `&` and `|` implement short-circuiting: * in a union, if the left side is already “inside”, skip the rest of the branch; * in an intersection, if the left side is already “outside”, skip the rest. * The result is a tri-state (`kInside`, `kOutside`, `kSurface`) for the full Boolean solid at the query point, with no recursion and good short-circuit behaviour. 3. Effective primitives (negation-aware DIN/DOUT kernel) * For distance queries we work with **effective primitives**: a base component plus a possible `!` in front of it. * We define two helpers: * `InsideEff(component, neg, p)` → inside state of the **effective** solid (primitive or its complement) at point `p`. * `DistEffPrimitive<FORDIN>(component, neg, p, dir, stepMax, sEff)` → distance to **enter** or **exit** the effective solid, depending on `FORDIN`: * For `FORDIN == true` (`DistanceToIn`): * If already inside/on surface of the effective solid: distance 0. * If outside: * `A`: use `A.DistanceToIn`. * `!A`: use `A.DistanceToOut` (entering `!A` = leaving `A`). * For `FORDIN == false` (`DistanceToOut`): * If outside effective solid: return “wrong side” (e.g. -1). * If inside or on surface: * `A`: use `A.DistanceToOut`. * `!A`: use `A.DistanceToIn` (exiting `!A` = entering `A`). * These two helpers encapsulate all the **negation logic**; everything higher-level only deals with effective states and distances. 4. DIN/DOUT via repeated scope sweeps + re-evaluation * Instead of computing full entry/exit intervals for each primitive, we use **“one-step sweeps”** over the Boolean expression and **hop from boundary to boundary**, re-evaluating the full Boolean state after each hop, similar in spirit to the binary-tree algorithms. * For a given origin `p` and direction `dir`, and a given scope `S`: * A single scope sweep (`ScopeStepDistance<FORDIN>`) does: * Iterate over all terms in `S` (primitives or sub-scopes), using an explicit scope stack (no recursion). * For each term: * Compute the effective inside state at `p`. * Compute the effective distance `d` using `DistEffPrimitive<FORDIN>`, or (for sub-scopes) using the same logic applied to that sub-scope. * For `AND` scopes: pick the **maximum** distance among valid hits. (The earliest place where all components can become inside/outside.) * For `OR` scopes: pick the **minimum** distance among valid hits. (The earliest place where any component can make the scope inside/outside.) * Optionally short-circuit: * In an `AND`, if some component has no hit at all in the current step, the whole scope cannot trigger a change from this origin. * In an `OR`, if a term hits very early and we know later terms can’t improve (due to step limits), we can stop. * The sweep returns: * A candidate distance `d_best` to the next relevant Boolean boundary of the scope. * The term (primitive / sub-scope) that caused it (for debugging or future optimizations). * Top-level `DistanceToIn/Out` (`EvaluateDistance<FORDIN>`) then does: * 1. Initial side check with `EvaluateInside`: * For DIN: if we start inside, return -1 (wrong side). * For DOUT: if we start outside, return -1. * 2. **Loop** along the ray: * Call `ScopeStepDistance<FORDIN>` on the top-level scope from the current point `p_curr` with remaining `stepMax - t_acc`. If no valid hit: return `kInfLength` (no entry/exit within `stepMax`). * Advance by that distance `d`, plus a small **numerical push** (`kPushTolerance`) along the direction; update `t_acc` and `p_curr`. * Re-evaluate **the full Boolean inside state** at the new point using `EvaluateInside`. * Stopping condition: * For DIN: stop when the solid becomes `kInside` or `kSurface`. * For DOUT: stop when the solid becomes `kOutside` or `kSurface`. * Otherwise, repeat: from the new origin we perform another sweep to find the next candidate boundary. * This gives the same qualitative behaviour as the recursive binary-tree implementation: * **Multiple internal propagations** are naturally handled by the outer loop. * We never need recursion, only an explicit small stack for scopes. * We avoid computing full intervals for all primitives; we only compute distances that are actually needed for the current step. 5. SafetyToIn / SafetyToOut * Safety functions are implemented using the **same Boolean algebra** and the same `LogicExpression` interpreter: * For primitives, we use their `SafetyToIn` / `SafetyToOut` (or equivalent). * For complex expressions: * `SafetyToIn` of an **intersection** behaves like the maximum of the component safeties (you must move far enough to be safe with respect to all components). * `SafetyToIn` of a **union** behaves like the minimum of the component safeties (you are unsafe as soon as you are too close to any member). * Complemented components reuse the same Inside/complement rules as for DIN/DOUT. * Evaluation is done by a **non-recursive** scope stack, analogously to `EvaluateInside`, folding primitive safeties with the correct Boolean rules. * The result is conservative by construction (never overestimates safety), and it reuses the same logic infrastructure as Inside and DIN/DOUT, making it consistent and GPU-friendly. --- ## 5. Implementation of a more efficient GPU-friendly navigation accelerator **Problem :** *Our current optimizer uses BVH, which offers no locality **along** the track, so often has to check much more candidates than needed. We want to implement a more efficient stack-friendly voxel-based optimizer that gives this locality along the voxel "trace" of the particle.* **Implementation choice :** Build a per–logical-volume, adaptive fixed-depth voxel grid (depth `D_fixed` chosen from daughter count and bbox) that is dense at that depth—so `Locate(p)` is `O(1)` to the cell—while any “heavy” cell owns a tiny local stackless octree (1–2 extra levels) to keep its candidate list small. Store everything in a flat, POD format (floats) usable on host & device. Navigation does 3D DDA to march voxels in strict ray order for `ComputeStep`, and uses a small per-query marking cache to avoid re-testing primitives that appear in multiple cells. Primitives are assumed non-overlapping, so `Locate` returns on first `Inside == true`. If no candidate is found within the requested range, return that range unchanged. 1. **Define flat, shared data types (host/device)** ```cpp struct VoxelNode { uint32_t firstChild, primStart, skip; uint16_t primCount; uint8_t childMask, level; }; struct QueryMarker { uint32_t gen, ids[16], genFor[16], counter; void begin(uint32_t g); bool seen(uint32_t pid) const; void mark(uint32_t pid); }; ``` 2. Specify the LogicalVolume interfaces * Builder-time (host): GetBoundingBox(min,max), GetDaughterCount(), GetDaughterAABB(i,min,max) (all in LV frame). * Navigation-time (host/device accessor): `Inside(pid,pLocal), DistanceToIn(pid,pLocal,dLocal,t), Safety(pid,pLocal)`. 3. Write the host-only builder: `OctreeVoxelBuilder` * Read LV bbox; downcast to `float3` (`lvMin`, `lvMax`). * Gather all daughters’ local `AABBs`; downcast to `float`. * **Choose** `dimLog2` **adaptively**: target `~K` prims per cell (e.g., `K=8`). `D = clamp(ceil(log₈(n/K)), 0..7)`. * Allocate dense `cellRoot` of size `(1<<D)^3`, init to `INVALID`. * Scatter each primitive’s `AABB` to overlapped cell indices (compute integer `[x0..x1]×[y0..y1]×[z0..z1]`). * For each cell: * If empty: leave INVALID. * If `count ≤ leafThreshold (8–12)`: append a leaf node (write `primStart/primCount` into `primIndices`) and set `cellRoot[cell]=nodeIdx`. * Else: build a local octree inside that cell’s `AABB` (max local depth 1–2), splitting `2×2×2`, distributing the cell’s list to children; leaves append candidate ranges; internal nodes get `firstChild/childMask`. Nodes are pushed into a single linear array. * Add skip pointers to all nodes via a DFS post-process to enable stackless walks. * (Optional) Detect “big primitives” (appear in more than M cells, e.g., 32): remove them from per-cell lists and store once in bigPrims to be tested once per query. 4. Flatten to device-ready buffers * From builder vectors, allocate device (or shared) arrays and memcpy: `cellRoot, nodes, primIndices, (bigPrims)`. * Fill an OctreeVoxelData descriptor with float bbox, dimLog2, offsets, and counts. 5. Utility math shared by host/device * `world_to_cell(p, lvMin, lvMax, dimLog2)` → `(ix,iy,iz)` with clamping. * `idx3D(ix,iy,iz,N)` for dense table indexing. * DDA helpers: `setupDDA(point, dir, vox), stepDDA(state, vox); compute tNext{X,Y,Z}, tDelta{X,Y,Z}, maintain tNext = min(...)`. 6. Implement `Locate(Vector3D<double> const& point)` * Convert input to `float3`. * Map to cell in `O(1)`; fetch `root = cellRoot[ci]`. If `INVALID`, return “not found”. * Walk local tree **stacklessly** starting at root: * At leaf: for `primCount` candidates: if not `seen(pid)`, `mark(pid)`, then `Inside(pid,pLocal)`; on first true, return that `pid`. * Else: `idx = firstChild`; when done with leaf, `idx = skip`. * If nothing found, return “not found”. 7. Implement ComputeStep(point, direction, range_to_search) * Convert inputs to `float3`; keep direction magnitude (no need to normalize for DDA). * Initialize DDA from `point` and `direction`. Set `bestT = range`, `hit=false`. * While inside and `dda.t < range`: * `ci = idx3D(cell)`; `root = cellRoot[ci]`. * If `root != INVALID`: walk local tree (as above). For each candidate not seen: * If `DistanceToIn(pid, p, d, t)` and `t < bestT` and `t >= dda.t`: set `bestT=t`, `hit=true`. * Early-out if `hit && bestT <= dda.tNext` (no need to enter next voxel). * `stepDDA()`. * Return bestT (equals range_to_search if no candidate within range). 8. Implement `Safety(point, range_to_search)` * Convert to `float3`. * Map to current cell; if empty → return range. * Walk local tree once; for unseen candidates compute `s = Safety(pid, p)` and track `best = min(best, s)`. * Return `min(best, range)`. * (Optional) For robust safety near boundaries, also scan the 26 neighbor cells if best is larger than the distance to the nearest cell face. 9. Per-query marking to avoid duplicate tests * Use `QueryMarker.begin(gen++)` per query. * Before testing a candidate, check `seen(pid)`; if `false`, `mark(pid)` and test. * (Optional) Combine with the big-primitive list tested once per query at the beginning. 10. Memory + performance knobs * Tune leafThreshold (8–16) and max local depth (1–2). * Cap `dimLog2` (e.g., 0..7) to bound memory; your heuristic already adapts to `nPrims`. * Store `cellRoot` in Morton (Z-order) to improve spatial locality with DDA. * Keep all arrays SoA/contiguous; prefer 32-bit indices; floats for bounds only. * If memory tight: quantize lvMin/lvMax to 16-bit and reconstruct per axis. 11. Testing & validation (order) * Unit-test mapping: `world_to_cell` and `idx3D` against edge cases. * Build small LVs (N=0, N=1, N≈K, N≫K) and inspect per-cell counts. * Check `Locate` correctness using ground-truth brute-force. * Check `ComputeStep` vs brute-force closest `DistanceToIn` on randomly generated rays; verify early-out logic. * Check `Safety` vs brute-force min distance; add neighbor scan if needed. * Stress-test duplicates: place a large primitive spanning many cells; verify the marking cache prevents re-tests on a DDA walk. 12. Integration * Wrap `OctreeVoxelData` in a VecGeom-friendly view. * Store an array of voxelizers indexed by LV id. * In navigation, pick the current LV’s voxelizer, then call `Locate` / `ComputeStep` / `Safety` with `double` inputs (they internally convert to `float`). ## 6. Portability for VecGeom **Problem :** *VecGeom was designed CPU/CUDA only, without a portability layer. Even without adopting a performance portability framework, we want to be at least CPU/CUDA/HIP capable. This can be done in stages: stages 1–3 to get a single-source CUDA/HIP build compiling, and stages 4–6 to redesign DevicePtr/DeviceManager, introduce POD data structs, and update navigation/volume code to consume them. This involves de-virtualizing the device-facing navigation paths so BVHNavigator and friends never hit a vtable. The plan to do this is shown after these stages.* ### Stage 1: Add HIP build/config switches and one-device vocabulary Extend `VecGeom/base/Config.h.in` with `VECGEOM_ENABLE_HIP`; in `VecGeom/CMakeLists.txt` detect `hipcc`, set device flags/architectures, and emit a single `VECGEOM_DEVICE_BACKEND` / `VECGEOM_DEVICE_SOURCE` define used everywhere (stop using `__CUDACC__` directly). Keep CUDA path working while wiring HIP build targets and install components. ### Stage 2: Centralize device namespace/qualifier macros Refactor `VecGeom/base/Cuda.h` into a backend-agnostic `Device.h` that sets: ```cpp inline namespace VECGEOM_DEVICE_NAMESPACE { /* cuda or hip */ } ``` Also define host/device forward-declare macros and `VECCORE_DEVICE_COMPILATION` aliases. Replace `inline namespace cuda/cxx` with `inline namespace VECGEOM_DEVICE_NAMESPACE`, updating places using `VECGEOM_DEVICE_DECLARE_*` and code including `VecGeom/backend/cuda/Backend.h`. ### Stage 3: Wrap the runtime API behind macros Create backend/device/Api.h mapping: * Types: DeviceError_t, DeviceStream_t, DeviceMemcpyKind * Functions: Malloc, Free, Memcpy, DeviceSynchronize Using: - `VECGEOM_DEVICE_API_SYMBOL` to declare the functions/types in a backend-agnostic way. - `VECGEOM_DEVICE_API_CALL` (from `VecGeom/base/Assert.h`) to wrap calls with error checking. Replace direct CUDA calls in `VecGeom/backend/cuda/Interface.h` and its callers with these wrappers, so the rest of the code only depends on the abstracted API, not directly on `<cuda_runtime_api.h>` or `<hip/hip_runtime_api.h>`. ### Stage 4: Backend-agnostic device pointers and manager ```cpp // Generalize VecGeom/backend/cuda/Interface.h → backend/device/Interface.h template <typename Backend> struct DevicePtr { typename Backend::Pointer raw; }; ``` - Make `DevicePtr` templated on the backend (CUDA/HIP). - Remove hard-coded `cudaStream_t`; keep copies/memsets behind backend macros. - Rename `VecGeom/management/CudaManager.*` → `DeviceManager.*`. - Parameterize on backend and remove `#ifdef VECCORE_CUDA` logic. - All host–device transfers go through a uniform API wrapper. --- ### Stage 5: Eliminate device-side constructors by serializing POD - Replace `ConstructOnGpu`, `ConstructArrayOnGpu`, `ConstructManyOnGpu` in `backend/device/Interface.h` with **host-side serialization**. - For each class with `CopyToGpu` today (`Unplaced*`, `PlacedVolume`, `LogicalVolume`, `SpecializedPlacedVolImplHelper`, `UnplacedVolume`): - Introduce `ToDeviceData()` producing a trivially-copyable `*Data` struct: - indices instead of pointers - POD transforms - POD bounds (AABBs) - In `DeviceManager.cpp`: - Stage these structs into contiguous host buffers - Bulk-copy them to device - **Remove all device-side `new`, vector/set allocations** Example POD sketch: ```cpp struct BoxData { float dx, dy, dz; }; struct LogicalVolumeData { int unplaced_id; Span<int> daughters; }; struct PlacedVolumeData { int logical_id; TransformData trafo; }; ``` --- ### Stage 6: Replace device-only containers/views and patch call sites - Replace `VecGeom/base/Vector.h` on device with a **Span-like** view (`pointer + size`) stored in POD. - Ensure navigation (`LoopNavigator`, BVH device paths) consumes spans of indices, not owning containers. - Boolean logic (`BooleanComponent`, `SolidLogicTableAccess`) must use the same spans. - Update `backend/cuda/Backend.h` → a backend-neutral launcher using the unified API. - Re-enable device asserts/error handling for HIP (`__HIP_DEVICE_COMPILE__`). Example span: ```cpp template <typename T> struct Span { T* ptr; int size; __device__ T& operator[](int i) const { return ptr[i]; } }; ``` --- ### Plan for de-virtualizing navigation #### Identify device-side virtual calls `BVHNavigator` and `LoopNavigator` currently call (virtually): - `Inside` - `DistanceToIn` - `DistanceToOut` - `SafetyToIn` - `SafetyToOut` - `Normal` - `GetTransformation` - `GetLogicalVolume` - `GetUnplacedVolume` We must remove all device virtual dispatch. --- #### Define POD device views ```cpp struct DevicePlacedVolume { ESolidType type; int id; BBox bbox; TransformData trafo; Span<int> daughters; int logical_id; }; ``` Per-shape POD: ```cpp struct BoxData { float dx, dy, dz; }; struct SphereData { float rmin, rmax; float sphi, dphi; float stheta, dtheta; }; // etc. ``` Logical volume POD: ```cpp struct DeviceLogicalVolume { Span<int> daughters; int unplaced_id; }; ``` Spans always point into flat arrays. No virtual functions exist on device. --- #### Shape dispatch via `switch(type)` - Implement backend-neutral helpers: ```cpp __device__ InsideResult ShapeDispatch_Inside( ESolidType type, const void* shape_data, const Vector3D<float>& p_local); __device__ float ShapeDispatch_DistanceToIn(...); __device__ float ShapeDispatch_DistanceToOut(...); __device__ float ShapeDispatch_SafetyToIn(...); __device__ float ShapeDispatch_SafetyToOut(...); __device__ Vector3D<float> ShapeDispatch_Normal(...); ``` - Dispatch table via `switch(type)` calling each shape’s implementation. - Lives in `backend/device/ShapeDispatch.h`. - Usable by CUDA and HIP builds. --- #### Navigator refactor - Define: ```cpp using DeviceNavVolume = DevicePlacedVolume; ``` - Replace virtual calls inside the **device path** of BVHNavigator: Before: ```cpp vol->DistanceToIn(p, dir, maxStep); ``` After: ```cpp ShapeDispatch_DistanceToIn( vol.type, vol.shape_ptr, p_local, dir_local, maxStep ); ``` - Host path keeps virtuals, guarded: ```cpp #if !VECGEOM_DEVICE_COMPILE return vol->DistanceToIn(...); #endif ``` --- #### Serialization / staging Manager produces flat device arrays: - `device_shapes[ESolidType]` → contiguous regions of shape POD - `device_placed[]` → array of `DevicePlacedVolume` - `device_daughters[]` → flat list of daughter indices - `device_logical[]` → POD logical volumes CPU-side builder packs structs using `switch(type)`. Bulk memcpy to device; **no device constructors**. --- #### Rollout steps 1. Introduce `ESolidType` tags on all host `VPlacedVolume`. 2. Add initial POD structs + `ShapeDispatch` for 3 shapes (Box/Sphere/Tube). 3. Convert BVHNavigator device code to use POD + dispatch. 4. Implement serialization path producing `device_placed`, `device_shapes`, `device_daughters`. 5. Incrementally add remaining shapes to dispatch + serialization. 6. Remove any remaining device virtual calls. 7. Keep host virtual interface intact; device always runs tag+switch. This produces a **HIP-compatible, no-virtual, POD-based device navigation**. --- ## 7. Documentation **Goal:** Provide complete documentation for user interfaces (geometry construction and navigation) and for the underlying algorithms. Documentation of algorithms will be written **in parallel with the code restructuring** (solids cleanup, navigation contract redesign, POD transition). --- ### 7.1 Geometry Construction Interfaces - Document user-facing Geometry APIs: - `VUnplacedVolume`, `LogicalVolume`, `VPlacedVolume`, `GeoManager` - Primitive solids, Boolean solids, scaled solids, multi-unions - Geometry hierarchy patterns (world, daughters, transforms) - Create *Geometry Construction Manual*: - Getting-started examples - Construction patterns (layered detectors, booleans, scaling) - Coordinate frames and tolerance conventions - Document **VGDML as an exchange format**, not user-writable: - Supported conversion behaviour - Limitations - Add concise Doxygen to all public constructors/methods. - Deliverable: `docs/geometry-construction.md`. --- ### 7.2 Navigation Interfaces & Navigation Contract - Define the **new navigation contract**: - Semantics of `Locate`, `ComputeStep`, `ComputeSafety` - Inputs/outputs, boundary rules, tolerances - Handling of boundary flags and `last exited` - Produce Navigation User Guide: - Overview of navigators (BVH, Loop, GPU) - Typical call patterns - Examples with GDML geometry - **Algorithm documentation depends on the new navigation contract implementation**. Deliverables: - `docs/navigation-contract.md` - `docs/navigation-user-guide.md`. --- ### 7.3 Algorithm Documentation for Solids - For each solid kernel, add a top-of-file note describing: - Purpose / geometric model - Interface contract - Algorithm outline - Numerical stability and tolerances - Special/corner cases - Performance notes - Priority solids: Box, Tube, Cone, Sphere, Torus, Polycone, Polyhedron, Extruded, GenTrap, Tessellated. **Developed in parallel** with solids cleanup and removal of vector remnants. Deliverable: updated kernel files with algorithm notes. --- ### 7.4 Navigation Algorithms (BVH, Loop, Safety) - Document: - Navigation state transitions - Locate/step/safety conceptual model - Grazing-ray and boundary handling - BVH structure and traversal logic - ULP tolerances and false-hit rejection - Add algorithm notes to navigator files **after** new navigation contract is implemented. Deliverable: navigator algorithm notes. --- ### 7.5 Surface Model ("Frames") - Write conceptual overview; - Half-space primitives, frame types - Mixed precision - BVH traversal in surface model - Ray–surface intersection patterns - Add top-of-file algorithm notes to implementation files. Deliverables: - `docs/surface-model.md` - Annotated implementation files. --- ### 7.6 Workflow & Deliverables - Maintain a tracking checklist per section (7.1–7.5). - MR templates requiring documentation updates. - End-of-2026 deliverables: - Geometry Construction Manual - Navigation Contract + User Guide - Algorithm documentation (solids, navigation, surface model) - Updated Doxygen/API references. --- # VecGeom 2026 — Prioritized Items (Updated) ## **Priority 1 — Improved navigation performance** 1. **More efficient GPU navigation accelerator (voxel grid + local octrees)** 2. **Remove recursion from Boolean navigation and locate/relocate** --- ## **Priority 2 — Navigation API** 3. **Finalize & implement the new navigation contract** - Unified API for Locate/Step/Safety - Pure NavStateIndex/Tuple - Boundary semantics and last-exited behaviour - CPU reference navigator 4. **Bridge GPU (CUDA first) to the new contract** - Make BVHNavigator use the new API - Unify semantics CPU ↔ GPU - Ensure AdePT integration --- ## **Priority 3 — Portability** 5. **Portability layer (CUDA/HIP single-source)** - Stages 1–3 (build, namespaces, runtime API) - Stages 4–6 (DeviceManager, POD structs, switch-dispatch) 6. **De-virtualize device navigation path** - ShapeDispatch - DevicePlacedVolume / DeviceShapeData - Remove device virtual calls --- ## **Priority 4 — Solids cleanup and documentation** 7. **Solids cleanup: remove vector remnants** - Prepare kernels for clearer algorithms - Enable mixed-precision later 8. **Algorithm documentation (in parallel with cleanup)** - Solids documented with GenTrap-style notes - Navigation algorithms documented after new contract 9. **Documentation for users (geometry construction, navigation guide)** - High-level user docs proceed independently - Algorithm notes depend on cleanup --- # Longer-term items These items align to the final goal that VecGeom eventually becomes the single point of maintenance for geometry navigation algorithms (`USolids` library mission), going beyond solids-level. ## Preliminary steps for ROOT/Geant4 compatibility Implement missing functionality allowing to have full compatibility with ROOT/Geant4 navigaton, e.g. missing solids (half-space, displaced solid, ...), parallel world navigation, ... ## Writing a TGeoNavigator-derived interface This will provide full ROOT navigation functionality and dispatching to VecGeom navigators. ## Writing a Geant4 VNavigator-derived interface This will provide full Geant4 navigation functionality and dispatching to VecGeom navigators.