TOP
Search the Dagstuhl Website
Looking for information on the websites of the individual seminars? - Then please:
Not found what you are looking for? - Some of our services have separate websites, each with its own search option. Please check the following list:
Schloss Dagstuhl - LZI - Logo
Schloss Dagstuhl Services
Seminars
Within this website:
External resources:
  • DOOR (for registering your stay at Dagstuhl)
  • DOSA (for proposing future Dagstuhl Seminars or Dagstuhl Perspectives Workshops)
Publishing
Within this website:
External resources:
dblp
Within this website:
External resources:
  • the dblp Computer Science Bibliography


Dagstuhl Seminar 13142

Correct and Efficient Accelerator Programming

( Apr 01 – Apr 04, 2013 )

(Click in the middle of the image to enlarge)

Permalink
Please use the following short url to reference this page: https://www.dagstuhl.de/13142

Organizers

Contact


Motivation

In recent years, massively parallel accelerator processors, primarily GPUs, have become widely available to end-users. They overcome the memory in-efficiency in multi-core CPUs by equipping each processor element (PE) with a small amount of on-chip private memory, and by providing local memory that is shared among groups of PEs. As a result, private and local memory can be accessed by a PE with no or minimal contention. An accelerator typically has a special-purpose instruction set architecture and organization, geared towards the application domain that it has been designed to accelerate. Thus, if the right accelerator is applied to the right application, special-purpose hardware support can lead to very high performance. Finally, in massively parallel accelerators it is common to reduce the clock frequency of individual PEs, allowing more PEs to be accommodated. Summarizing, as stated by Garland and Kirk in their CACM paper in 2010:

"For workloads with abundant parallelism, GPUs deliver higher peak computational throughput than latency-oriented CPUs"

Using accelerators, tasks such as media processing, simulation and eye-tracking can be accelerated to beat CPU performance by orders of magnitude. Performance is gained in energy efficiency and execution speed, allowing intensive media processing software to run in low-power consumer devices.

Accelerators present however a serious challenge for software developers. A system may contain one or more of the plethora of accelerators on the market, with many more products anticipated in the immediate future. Software for accelerators is currently written in low-level languages, such as OpenCL, CUDA, or architecture-specific assembly code. This leads to increased development costs and complex maintenance of multiple platforms. In addition, performance problems occur as applications optimised for one platform may not perform well on others, and due to the increased usage of GPUs in safety-critical domains (such as medical image processing), correctness of accelerator programs is of vital importance. Applications must thus exhibit portable correctness, operating correctly on any configuration of accelerators, and have portable performance, exploiting processing power and energy efficiency offered by a wide range of devices.

The aim of this Dagstuhl Seminar is to bring together researchers from various sub- disciplines of computer science, such as programming languages for multi-core systems and their compilation, and researchers working on the verification of multi-core programs and their data structures, to brainstorm and discuss on design techniques and tools for correct and efficient accelerator programming:

  • Novel & attractive methods for constructing system-independent accelerator programs;
  • Advanced code generation techniques to produce highly optimized system-specific code from system-independent programs;
  • Scalable static techniques for analysing system-independent and system-specific accelerator programs both qualitatively and quantitatively.

Central topics of the seminar are portable performance and portable correctness. Software exhibits portable performance (time and energy-wise) if it performs acceptably well across accelerator devices in general, and near optimally on specific, widely used platforms. Portable correctness, with respect to a programming language specification, is achieved when correctness can be established in a device-independent manner.


Summary

In recent years, massively parallel accelerator processors, primarily GPUs, have become widely available to end-users. Accelerators offer tremendous compute power at a low cost, and tasks such as media processing, simulation and eye-tracking can be accelerated to beat CPU performance by orders of magnitude. Performance is gained in energy efficiency and execution speed, allowing intensive media processing software to run in low-power consumer devices. Accelerators present a serious challenge for software developers. A system may contain one or more of the plethora of accelerators on the market, with many more products anticipated in the immediate future. Applications must exhibit portable correctness, operating correctly on any configuration of accelerators, and portable performance, exploiting processing power and energy efficiency offered by a wide range of devices.

The seminar focussed on the following areas:

  • Novel and attractive methods for constructing system-independent accelerator programs;
  • Advanced code generation techniques to produce highly optimised system-specific code from system-independent programs;
  • Scalable static techniques for analysing system-independent and system-specific accelerator programs both qualitatively and quantitatively.

The seminar featured five tutorials providing an overview of the landscape of accelerator programming:

  • Architecture -- Anton Lokhmotov, ARM
  • Programming models -- Lee Howes, AMD
  • Compilation techniques -- Sebastian Hack, Saarland University
  • Verification - Ganesh Gopalakrishnan, University of Utah
  • Memory models -- Jade Alglave, University College London

In addition, there were short presentations from 12 participants describing recent results or work-in-progress in these areas, and two discussion sessions:

  • Domain specific languages for accelerators;
  • Verification techniques for GPU-accelerated software.

Due to the "correctness" aspect of this seminar, there was significant overlap of interest with a full week seminar on Formal Verification of Distributed Algorithms running in parallel. To take advantage of this overlap a joint session was organised, featuring a talk on verification of GPU kernels by Alastair Donaldson, Imperial College London (on behalf of the Correct and Efficient Accelerator Programming seminar) and a talk on GPU-accelerated runtime verification by Borzoo Bonakdarpour, University of Waterloo, on behalf of the Formal Verification of Distributed Algorithms seminar.

Copyright Albert Cohen, Alastair F. Donaldson, Marieke Huisman, and Joost-Pieter Katoen

Participants
  • Jade Alglave (University College London, GB) [dblp]
  • Adam Betts (Imperial College London, GB) [dblp]
  • Albert Cohen (ENS - Paris, FR) [dblp]
  • Christian Dehnert (RWTH Aachen, DE) [dblp]
  • Dino Distefano (Queen Mary University of London, GB) [dblp]
  • Alastair F. Donaldson (Imperial College London, GB) [dblp]
  • Jeremy Dubreil (Monoidics Ltd. - London, GB) [dblp]
  • Benoit Dupont de Dinechin (Kalray - Orsay, FR) [dblp]
  • Ganesh L. Gopalakrishnan (University of Utah - Salt Lake City, US) [dblp]
  • Sebastian Hack (Universität des Saarlandes, DE) [dblp]
  • Lee Howes (AMD - Sunnyvale, US) [dblp]
  • Marieke Huisman (University of Twente, NL) [dblp]
  • Christina Jansen (RWTH Aachen, DE) [dblp]
  • Joost-Pieter Katoen (RWTH Aachen, DE) [dblp]
  • Jeroen Ketema (Imperial College London, GB) [dblp]
  • Alexander Knapp (Universität Augsburg, DE) [dblp]
  • Georgia Kouveli (ARM Ltd. - Cambridge, GB) [dblp]
  • Alexey Kravets (ARM Ltd. - Cambridge, GB) [dblp]
  • Anton Lokhmotov (ARM Ltd. - Cambridge, GB) [dblp]
  • Roland Meyer (TU Kaiserslautern, DE) [dblp]
  • Cedric Nugteren (TU Eindhoven, NL) [dblp]
  • Zvonimir Rakamaric (University of Utah - Salt Lake City, US) [dblp]
  • Oliver Reiche (Universität Erlangen-Nürnberg, DE) [dblp]
  • Philipp Rümmer (Uppsala University, SE) [dblp]
  • Ana Lucia Varbanescu (TU Delft, NL) [dblp]
  • Sven Verdoolaege (INRIA - Le Chesnay, FR) [dblp]
  • Jules Villard (University College London, GB) [dblp]
  • Heike Wehrheim (Universität Paderborn, DE) [dblp]
  • Anton Wijs (TU Eindhoven, NL) [dblp]
  • Marina Zaharieva-Stojanovski (University of Twente, NL) [dblp]
  • Dong Ping Zhang (AMD - Sunnyvale, US) [dblp]

Classification
  • hardware
  • programming languages / compiler
  • semantics / formal methods

Keywords
  • Intermediate languages
  • multi-core programming
  • polyhedral compilation
  • portability
  • formal verification
  • correctness
  • efficiency