Fermilab Computing Division

CS Document 5676-v1

Kalman Filter Tracking on Parallel Architectures

Document #:
Document type:
Submitted by:
Oliver Gutsche
Updated by:
Oliver Gutsche
Document Created:
14 Jan 2016, 08:36
Contents Revised:
14 Jan 2016, 08:36
Metadata Revised:
14 Jan 2016, 08:36
Viewable by:
  • Public document
Modifiable by:

Quick Links:
Latest Version

Abstract: Limits on power dissipation have pushed CPUs to grow in parallel processing capabilities rather than clock rate, leading to the rise of "manycore" or GPU-like processors. In order to achieve the best performance, applications must be able to take full advantage of vector units across multiple cores, or some analogous arrangement on an accelerator card. Many core techniques will certainly be important if real-time processing is to keep up with detector data rates at CERN's Large Hadron Collider, for example, where planned upgrades will soon cause data to be produced at an unprecedented pace. For the High Luminosity LHC, the most computationally demanding task is expected to be track finding and fitting. It is projected to become by far the dominant problem in event reconstruction. Most of the common software for tackling this problem is based on Kalman filtering; these methods are known to produce robust physics results on real tracking detector systems, both in the trigger and offline. But Kalman filtering involves repetitious small-matrix operations that lack a natural SIMD formulation. The challenge for us is to recast the existing software so it can readily generate hundreds of shared-memory threads that exploit the underlying SIMD (or SIMT) instruction set of modern processor architectures. On their own, compilers are often unable to produce optimal code of this type, even when given parallelization directives via OpenMP and other pragmas. However, the source code itself may be written in a way that assists the compiler in creating SIMD instructions and orchestrating the data movement appropriately. In our case, an abundance of small parallel tasks is available: we show how the data and associated tasks can be grouped in a way that is conducive to both multithreading and vectorization. We demonstrate very good vector performance and scalability in key portions of the code. We also identify issues that may currently inhibit the full application from scaling up to large numbers of threads.
Files in Document:
Associated with Events:
Kalman Filter Tracking on Parallel Architectures held on 13 Jan 2016 in WH1W
DocDB Home ]  [ Search ] [ Authors ] [ Events ] [ Topics ]

DocDB Version 8.8.9, contact Document Database Administrators