Sat 17 Jun 2023 09:05 - 09:45 at Magnolia 1-3 - CSC: Open Session 1

The main drivers of HPC and AI workloads are a continuous growth of data and the need for efficient computing and scalability. However, without correctness, efficiency and scalability have little value. Testing for correctness in HPC and scientific computing is undoubtedly very hard: numerical errors can propagate to multiple layers, the lack of oracles limits testing, transient bit flips can corrupt data, and data races can remain hidden until some inputs expose them. While correctness issues can negatively impact programming productivity, it is still very challenging for HPC and scientific computing programmers to adopt and integrate correctness-checking practices into their workflows. In this talk, I will present my opinion on some of the challenges and opportunities in checking and testing for correctness in HPC and scientific computing, based on more than ten years of experience in a large HPC center. I will give examples of high productivity losses and other consequences caused by correctness issues. I aim to answer the question: How can we help HPC programmers focus on correctness first? Ultimately, minimizing the time spent debugging correctness issues will allow higher scientific productivity.

Sat 17 Jun

Displayed time zone: Eastern Time (US & Canada) change

09:00 - 11:00
09:00
5m
Day opening
Introduction
CSC

09:05
40m
Keynote
Letting HPC Programmers Focus On Correctness First, Then On PerformanceInvited Talk
CSC
Ignacio Laguna Lawrence Livermore National Laboratory
09:50
70m
Talk
Lightning Talks
CSC
Andrew Siegel , Sreepathi Pai University of Rochester, Harshitha Menon Lawrence Livermore National Lab, Piotr Luszczek , Alyson Fox , Vivek Sarkar Rice University, USA, Andrew W. Appel Princeton University