Dagstuhl Seminar 17461
Connecting Visualization and Data Management Research
( Nov 12 – Nov 17, 2017 )
Permalink
Organizers
- Remco Chang (Tufts University - Medford, US)
- Jean-Daniel Fekete (INRIA Saclay - Orsay, FR)
- Juliana Freire (New York University, US)
- Carlos E. Scheidegger (University of Arizona - Tucson, US)
Contact
- Michael Gerke (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Impacts
- Evaluating Visual Data Analysis Systems : A Discussion Report - Battle, Leilani; Angelini, Marco; Eichmann, Philipp; Sedlmair, Michael; Willett, Wesley; Santucci, Giuseppe; Fekete, Jean-Daniel; Catarci, Tiziana; Binnig, Carsten - HAL Inria, 2018.
- Evaluating Visual Data Analysis Systems : A Discussion Report : article in HILDA'18 Proceedings of the Workshop on Human-In-the-Loop Data Analytics - Battle, Leilani; Angelini, Marco; Binnig, Carsten; Catarci, Tiziana; Eichmann, Philipp; Willett, Wesley; Sedlmair, Michael; Santucci, Giuseppe; Fekete, Jean-Daniel - New York : ACM, 2018. - 6 pp..
Schedule
What prevents analysts from acquiring wisdom from data sources? To use data, to better understand the world and act upon it, we need to understand both the computational and the human-centric aspects of data-intensive work. In this Dagstuhl Seminar, we will establish the foundations for the next generation of data management and visualization systems by bringing together these two largely independent communities. While exploratory data analysis (EDA) has been a pillar of data science for decades, maintaining interactivity during EDA has become difficult, as the data size and complexity continue to grow. In modern day statistical systems, it is assumed that all data need to fit into memory in order to support interactivity. However, when faced with a large amount of data, few techniques can support EDA fluidly. During this process, interactivity is critical: if each operation takes hours or even minutes to finish, analysts lose track of their thought process. Bad analyses cause bad interpretations, bad actions and bad policies.
As data scale and complexity increases, the novel solutions that will ultimately enable interactive, large-scale EDA will have to come from truly interdisciplinary and international work. Today, database researchers can store and query massive amounts of data, including methods for distributed, streaming and approximate computation. Data mining techniques provide ways to discover unexpected patterns and to automate and scale well-defined analysis procedures. Recent systems research has looked at how to develop novel database systems architectures to support the iterative, optimization-oriented workloads of data-intensive algorithms. Of course, both the inputs and outputs of these systems are ultimately driven by people, in support of analysis tasks. The life-cycle of data involves an iterative, interactive process of determining which questions to ask, the data to analyze, appropriate features and models, and interpreting results. In order to achieve better analysis outcomes, data processing systems require improved interfaces that account for the strengths and limitations of human perception and cognition. Meanwhile, to keep up with the rising tide of data, interactive visualization tools need to integrate more techniques from databases and machine learning.
By bringing together the two disparate communities, we will lay the foundations for next generation of data (management, mining, retrieval) and interactive visualization systems. Isolated, computational breakthroughs will forever remain locked behind inadequate interfaces, while improvements in how users experience data analysis will never scale to the volume of present-day datasets. Together, these two communities will both realize their vision for empowering people to use data to understand and improve the world. The main goal of this seminar is to bring together researchers from the data management community and the interactive visualization community to address the challenge of envisioning and developing the next generation of data systems that can support the cognitive, perceptual, and analytical needs of the human. Few existing systems can truly do so at scale, and with the explosive growth in data size and complexity it is more important than ever to gather researchers from the different disciplines to designing a research agenda that can meet the demands of the future. Specifically, we aim to:
- Formulate a research agenda around the challenge of reducing latency in interactive data systems. For example, develop novel pre-aggregation strategies that take into account the particular constraints and strengths of human perceptual systems; this will enable at-scale human-centric database indices, human-centric statistical analysis environments, and so on.
- Focus on specific theoretical and practical problems that need to be solved in order to enable human-centric, large-scale data exploration.
- Run special issues in leading journals such as IEEE CG&A and ACM TiiS to disseminate the developed research agenda and the research outcomes from this community.
What prevents analysts from acquiring wisdom from data sources? To use data, to better understand the world and act upon it, we need to understand both the computational and the human-centric aspects of data-intensive work. In this Dagstuhl Seminar, we sought to establish the foundations for the next generation of data management and visualization systems by bringing together these two largely independent communities. While exploratory data analysis (EDA) has been a pillar of data science for decades, maintaining interactivity during EDA has become difficult, as the data size and complexity continue to grow. Modern statistical systems often assume that all data need to fit into memory in order to support interactivity. However, when faced with a large amount of data, few techniques can support EDA fluidly. During this process, interactivity is critical: if each operation takes hours or even minutes to finish, analysts lose track of their thought process. Bad analyses cause bad interpretations, bad actions and bad policies.
As data scale and complexity increases, the novel solutions that will ultimately enable interactive, large-scale EDA will have to come from truly interdisciplinary and international work. Today, database systems can store and query massive amounts of data, including methods for distributed, streaming and approximate computation. Data mining techniques provide ways to discover unexpected patterns and to automate and scale well-defined analysis procedures. Recent systems research has looked at how to develop novel database systems architectures to support the iterative, optimization-oriented workloads of data-intensive algorithms. Of course, both the inputs and outputs of these systems are ultimately driven by people, in support of analysis tasks. The life-cycle of data involves an iterative, interactive process of determining which questions to ask, the data to analyze, appropriate features and models, and interpreting results. In order to achieve better analysis outcomes, data processing systems require improved interfaces that account for the strengths and limitations of human perception and cognition. Meanwhile, to keep up with the rising tide of data, interactive visualization tools need to integrate more techniques from databases and machine learning.
This Dagstuhl seminar brought together researchers from the two communities (visualization and databases) to establish a research agenda towards the development of next generation data management and interactive visualization systems. In a short amount of time, the two communities learned from each other, identified the strengths and weaknesses of the latest techniques from both fields, and together developed a "state of the art" report on the open challenges that require the collaboration of the two communities. This report documents the outcome of this collaborative effort by all the participants.
- Sihem Amer-Yahia (CNRS - St. Martin-d'Hères, FR) [dblp]
- Leilani Battle (University of Washington - Seattle, US) [dblp]
- Carsten Binnig (TU Darmstadt, DE) [dblp]
- Tiziana Catarci (Sapienza University of Rome, IT) [dblp]
- Remco Chang (Tufts University - Medford, US) [dblp]
- Surajit Chaudhuri (Microsoft Research - Redmond, US) [dblp]
- Stephan Diehl (Universität Trier, DE) [dblp]
- Harish Doraiswamy (New York University, US) [dblp]
- Steven M. Drucker (Microsoft Research - Redmond, US) [dblp]
- Jason Dykes (City - University of London, GB) [dblp]
- Jean-Daniel Fekete (INRIA Saclay - Orsay, FR) [dblp]
- Danyel Fisher (Microsoft Research - Redmond, US) [dblp]
- Juliana Freire (New York University, US) [dblp]
- Michael Gleicher (University of Wisconsin - Madison, US) [dblp]
- Hans Hagen (TU Kaiserslautern, DE) [dblp]
- Gerhard Heyer (Universität Leipzig, DE) [dblp]
- Heike Hofmann (Iowa State University - Ames, US) [dblp]
- Daniel A. Keim (Universität Konstanz, DE) [dblp]
- Tim Kraska (Brown University - Providence, US) [dblp]
- Heike Leitte (TU Kaiserslautern, DE) [dblp]
- Zhicheng Liu (Adobe Systems Inc. - Seattle, US) [dblp]
- Volker Markl (TU Berlin, DE) [dblp]
- Alexandra Meliou (University of Massachusetts - Amherst, US) [dblp]
- Torsten Möller (Universität Wien, AT) [dblp]
- Dominik Moritz (University of Washington - Seattle, US) [dblp]
- Hannes Mühleisen (CWI - Amsterdam, NL) [dblp]
- Arnab Nandi (Ohio State University - Columbus, US) [dblp]
- Behrooz Omidvar-Tehrani (LIG - Grenoble, FR) [dblp]
- Themis Palpanas (Paris Descartes University, FR) [dblp]
- Carlos E. Scheidegger (University of Arizona - Tucson, US) [dblp]
- Gerik Scheuermann (Universität Leipzig, DE) [dblp]
- Michael Sedlmair (Universität Wien, AT) [dblp]
- Thibault Sellam (Columbia University - New York, US) [dblp]
- Juan Soto (Technische Universität Berlin, DE) [dblp]
- Richard Wesley (Tableau Software - Seattle, US) [dblp]
- Wesley J. Willett (University of Calgary, CA) [dblp]
- Eugene Wu (Columbia University - New York, US) [dblp]
- Yifan Wu (University of California - Berkeley, US) [dblp]
Classification
- computer graphics / computer vision
- data bases / information retrieval
- society / human-computer interaction
Keywords
- Information Visualization
- Database Management Systems
- Interactive Data Analysis
- Human-Centric Computing
- Big Data