Register to Node-level Performance Engineering course at CSCS
CSCS is pleased to be announce the following course:
Node-level Performance Engineering
to be held at CSCS in Lugano on May 15-16, 2014.
The instructors will be Prof. Gerhard Wellen and Dr. Georg Hager from RRZE, Germany
The 2-day training course aims to teach performance engineering approaches on the compute node level.
“Performance engineering” is intended as developing a thorough understanding of the interactions between
software and hardware.
Course agenda and registration » (registration is open until May 8th, 2014)
Agenda
Introduction
- Intel and AMD x86 architectures
- ccNUMA
- Performance modeling & engineering approaches
- Our Approach
Practical performance analysis
- The LIKWID tools
- Typical performance patterns
Microbenchmarks and the memory hierarchy
- Understanding the memory hierarchy
- Data transfer between memory levels
- Write allocate vs. NT stores
- Modeling of cache hierarchies
- Contention
- NUMA effects – anisotropy and asymmetry
Typical node-level software overheads
- Cost of synchronization
- Work distribution
Example Problem: The 3D Jacobi solver
- Core-level optimizations
- Blocking
- Non Temporal stores
- SIMD vectorization (SSE, AVX)
- Multithreading – contention at different memory hierarchies
- Temporal Blocking
Example Problem: The Lattice-Boltzmann Method (LBM)
- Introduction
- Roofline Model
- Data layout
- Non Temporal stores
- Model for in-cache data & multicore scaling
- Sparse representation and options for propagation
Example Problem: Sparse Matrix-Vector Multiplication
- Data layouts
- Performance model – CPU vs. GPU
- Bandwidth reduction
Example Problem: A backprojection algorithm for CT reconstruction
- The algorithm
- Naïve analysis
- Detailed analysis and performance model
- Optimizations
Energy & Parallel Scalability
- Energy consumption of modern processors
- The energy-to-solution metric
- Performance engineering == power engineering
- Case studies