Dagstuhl Seminar 15421
Rack-scale Computing
( Oct 11 – Oct 16, 2015 )
Permalink
Organizers
- Babak Falsafi (EPFL - Lausanne, CH)
- Tim Harris (Oracle Labs - Cambridge, GB)
- Dushyanth Narayanan (Microsoft Research UK - Cambridge, GB)
- David A. Patterson (University of California - Berkeley, US)
Contact
- Andreas Dolzmann (for scientific matters)
- Annette Beyer (for administrative matters)
Schedule
Rack-scale computing is the emerging research area of how we design and program the machines used in data centers. “Traditional” data center racks each contain dozens of discrete machines connected by Ethernet or InfiniBand. Over the last few years researchers and industry have been moving away from this model to rack-level integrated design, driven by the need to increase density and connectivity between servers, while lowering cost and power consumption.
In the near future we expect to see rack-scale computers with 10,000 to 100,000 cores, petabytes of solid-state memory, and high-bandwidth / low-latency internal fabrics. This raises interesting research questions. How should the fabric be organized, and how should CPUs, DRAM, and storage be placed in it? Are different rack-scale designs required for different applications? How should we integrate rack-scale computers into data center networks and warehouse-scale computers (WSCs)? Should rack-scale machines be programmed like large shared-memory NUMA servers, like traditional distributed systems, or a combination of the two? What are the best communication primitives to let applications benefit from low-latency interconnects? What are the likely failure modes and how do we achieve fault tolerance? How can researchers effectively prototype and test novel ideas?
We wish to bring together researchers and practitioners working on:
- Physical design: High resource density under cost, power, and cooling constraints can require physical redesign of the rack. High utilization requires us to balance processing, bandwidth, and storage resources. Physical designs such as the Pelican cold storage rack achieve very high density with commodity components for specialized applications. The physical design space ranging from general to specialized needs further exploration.
- Systems-on-Chip. SoCs are used both in industrial and research rack-scale systems, motivated by a drive for high density and high performance-per-Watt. For instance, the UC Berkeley FireBox is a 50kW WSC building block containing a thousand compute sockets, each containing a SoC with around 100 cores connected to high-bandwidth on-package DRAM.
- Interconnects. Rack-scale computers could support workloads with more fine-grained communication than is supported in traditional racks. Research in rack-scale interconnects ranges from the physical level (for instance, silicon photonics) to new hardware-software interfaces. For instance, Scale-Out Numa exposes the inter-connect via a remote memory controller which is mapped into a node’s local cache-coherent address space.
- Storage systems. Emerging non-volatile random-access memory (NV-RAM) technologies promise high-capacity non-volatile storage with read performance comparable with DRAM. At the same time, other researchers are exploring 3D-stacking of FLASH and DRAM. These technologies can have a huge impact on the capabilities of rack-scale computers.
- Systems software and language runtime systems. How should operating systems manage and schedule applications on rack-scale systems? Should a single OS instance cover the whole rack, or should separate instances run on each core or socket? How do we expose locality, failures, and tail latencies to applications? What is the role of virtualization? What problems are best solved in hardware versus software Experience with multi-core research operating systems, such as Barrelfish, fos, and Tessellation is relevant here.
The goal of this Dagstuhl Seminar is to bring together leading international researchers from both academia and industry working on different aspects of rack-scale systems. Effective solutions will require renewed collaboration across architecture, systems, and programming language communities. In addition, we want to involve participants with practical experience of workloads, and of running industrial warehouse-scale systems.
Rack-scale computing is an emerging research area concerned with how we design and program the machines used in data centers. Typically, these data centers are built from racks of equipment, with each rack containing dozens of discrete machines connected by Ethernet or by InfiniBand. Over the last few years researchers have started to weaken the boundaries between these individual machines, leading to new "rack-scale" systems. These architectures are being driven by the need to increase density and connectivity between servers, while lowering cost and power consumption.
Initial commercial systems provide high-density processor nodes connected through an in-machine interconnect to storage devices or to external network interfaces (e.g., HPE Moonshot, or SeaMicro Fabric Compute). Many ideas are now being explored in research projects -- e.g., the use of custom system-on-chip processors in place of commodity chips, the use of emerging non-volatile-memory technologies or stacked Flash in place of disks, and the use of silicon photonics and wireless links for communication within or between rack-scale systems. In addition, researchers are exploring how systems software, language runtime systems, and programming models can evolve for these new architectures.
This seminar sought to bring together researchers working on different parts of these problems. We structured the seminar around a small number of invited introductory talks (Section 4) accompanied by break-out sessions (Section 5) and a series of four poster sessions. The poster sessions permitted everyone to have an opportunity to present their own work (if they wished to), and enabled many parallel discussions to continue at the same time around different posters.
- Gustavo Alonso (ETH Zürich, CH) [dblp]
- Yungang Bao (Chinese Academy of Sciences - Beijing, CN) [dblp]
- Angelos Bilas (FORTH - Heraklion, GR) [dblp]
- Peter Corbett (NetApp - Sunnyvale, US) [dblp]
- Paolo Costa (Microsoft Research UK - Cambridge, GB) [dblp]
- Christina Delimitrou (Stanford University, US) [dblp]
- Felix Eberhardt (Hasso-Plattner-Institut - Potsdam, DE) [dblp]
- Lars Eggert (NetApp Deutschland GmbH - Kirchheim, DE) [dblp]
- Babak Falsafi (EPFL - Lausanne, CH) [dblp]
- Paolo Faraboschi (HP Labs - Palo Alto, US) [dblp]
- Christof Fetzer (TU Dresden, DE) [dblp]
- Steve Furber (University of Manchester, GB) [dblp]
- Jana Giceva (ETH Zürich, CH) [dblp]
- Matthew P. Grosvenor (University of Cambridge, GB) [dblp]
- Boris Grot (University of Edinburgh, GB) [dblp]
- Tim Harris (Oracle Labs - Cambridge, GB) [dblp]
- Hermann Härtig (TU Dresden, DE) [dblp]
- Maurice Herlihy (Brown University - Providence, US) [dblp]
- Matthias Hille (TU Dresden, DE) [dblp]
- Torsten Hoefler (ETH Zürich, CH) [dblp]
- Konstantinos Katrinis (IBM Research - Dublin, IE)
- Kimberly Keeton (HP Labs - Palo Alto, US) [dblp]
- John Kim (KAIST - Daejeon, KR) [dblp]
- Christoph M. Kirsch (Universität Salzburg, AT) [dblp]
- Sergey Legtchenko (Microsoft Research UK - Cambridge, GB) [dblp]
- Martin Maas (University of California - Berkeley, US) [dblp]
- Sue Moon (KAIST - Daejeon, KR) [dblp]
- Andrew W. Moore (University of Cambridge, GB) [dblp]
- Dushyanth Narayanan (Microsoft Research UK - Cambridge, GB) [dblp]
- Jörg Nolte (BTU Cottbus, DE) [dblp]
- Mark H. Oskin (University of Washington - Seattle, US) [dblp]
- Simon Peter (University of Washington - Seattle, US) [dblp]
- Andreas Polze (Hasso-Plattner-Institut - Potsdam, DE) [dblp]
- Danica Porobic (EPFL - Lausanne, CH) [dblp]
- Zoran Radovic (Oracle - Stockholm, SE) [dblp]
- Kaveh Razavi (VU University Amsterdam, NL) [dblp]
- Randolf Rotta (BTU Cottbus, DE) [dblp]
- Ant Rowstron (Microsoft Research UK - Cambridge, GB) [dblp]
- Stefan Schmid (TU Berlin, DE) [dblp]
- Bernhard Schräder (Fujitsu Technology Solutions GmbH - Paderborn, DE)
- Malte Schwarzkopf (MIT - Cambridge, US) [dblp]
- Liuba Shrira (Brandeis University - Waltham, US) [dblp]
- Jens Teubner (TU Dortmund, DE) [dblp]
- Gael Thomas (Télécom & Management SudParis - Evry, FR) [dblp]
- Jana Traue (BTU Cottbus, DE) [dblp]
- Leendert van Doorn (Microsoft Corporation - Redmond, US) [dblp]
- Haris Volos (HP Labs - Palo Alto, US) [dblp]
- Bernard Wong (University of Waterloo, CA) [dblp]
- Noa Zilberman (University of Cambridge, GB) [dblp]
- Ferad Zyulkyarov (Barcelona Supercomputing Center, ES) [dblp]
Classification
- hardware
- operating systems
Keywords
- Rack-scale computing
- Systems-on-Chip (SoC)
- Interconnect networks
- Operating systems
- Language runtime systems