Dagstuhl-Seminar 15101
Bridging Information Visualization with Machine Learning
( 01. Mar – 06. Mar, 2015 )
Permalink
Organisatoren
- Daniel A. Keim (Universität Konstanz, DE)
- Tamara Munzner (University of British Columbia - Vancouver, CA)
- Fabrice Rossi (University of Paris I, FR)
- Michel Verleysen (University of Louvain, BE)
Kontakt
- Dagmar Glaser (für administrative Fragen)
Impacts
- The State of the Art in Integrating Machine Learning into Visual Analytics : article - Endert, Alex; Ribarsky, William; Turkay, Cagatay; William Wong, B. L.; Diaz Blanco, Ignacio, Rossi, Fabrice; Nabney, Ian T. - Chichester : Wiley, 2017. - pp. 458-486 - (Computer graphics forum ; 36. 2017, 8).
- Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis : article - Sacha, Dominik; Zhang, Leishi; Sedlmair, Michael; Lee, John A.; Peltonen, Jaakko; Keim, Daniel A.; North, Stephen A.; Weiskopf, Daniel - Los Alamitos : IEEE, 2017. - pp. 241 - 250 - (IEEE transactions on visualization and computer graphics ; 23. 2017, 1).
Programm
This Dagstuhl Seminar aims at bringing the visualization and machine learning communities together.
Information visualization and visual data mining leverage the human visual system to provide insight and understanding of unorganized data. Visualizing data in a way that is appropriate for the user's needs proves essential in a number of situations: getting insights about data before a further more quantitative analysis (e.g., for expert selection of a number of clusters in a data set), presenting data to a user through well-chosen table, graph or other structured representations, relying on the cognitive skills of humans to show them extended information in a compact way, etc.
The scalability of visualization methods is an issue: Human vision is intrinsically limited to between two and three dimensions, and the human preattentive system cannot handle more than a few combined features. In addition the computational burden of many visualization methods is too large for real time interactive use with large datasets. In order to address these scalability issues and to enable visual data mining of massive sets of high dimensional data (or so-called ''big data''), simplification methods are needed, so as to select and/or summarize important dimensions and/or objects.
Traditionally, two scientific communities developed tools to address these problems: the machine learning (ML) and information visualization (IV) communities. On the one hand, ML provides a collection of automated data summarizing/compression solutions. Clustering algorithms summarize a set of objects with a smaller set of prototypes, while projection algorithms reduce the dimensionality of objects described by high-dimensional vectors. On the other hand, the IV community has developed user-centric and interactive methods to handle the human vision scalability issue.
This Dagstuhl Seminar follows Seminar 12081, which provided to the participants from the IV and ML communities the ground for understanding each other. This new seminar will build on these grounds, and address key challenges such as interactivity, quality assessment, platforms and software, and others. The seminar will be organized in an interactive way. It is intended to start discussions by short presentations focused on open questions. As far as possible, these opening presentations should be made by pairs of researchers coming from the ML and IV fields, and prepared before the seminar. Most of the time slots will then be devoted to discussions. Structuring short talks will be inserted between discussions, and prepared on the fly by ''discussion leaders''. Key goals at short and long terms for work to be carried out by joining forces from the ML and IV fields will be identified as a conclusion of the seminar.
Motivations and context of the seminar
Following the success of Dagstuhl seminar 12081 "Information Visualization, Visual Data Mining and Machine Learning" [1, 2] which provided to the participants from the IV and ML communities the ground for understanding each other, this Dagstuhl seminar aimed at bringing once again the visualization and machine learning communities together.
Information visualization and visual data mining leverage the human visual system to provide insight and understanding of unorganized data. Visualizing data in a way that is appropriate for the user's needs proves essential in a number of situations: getting insights about data before a further more quantitative analysis (e.g., for expert selection of a number of clusters in a data set), presenting data to a user through well-chosen table, graph or other structured representations, relying on the cognitive skills of humans to show them extended information in a compact way, etc.
The scalability of visualization methods is an issue: human vision is intrinsically limited to between two and three dimensions, and the human preattentive system cannot handle more than a few combined features. In addition the computational burden of many visualization methods is too large for real time interactive use with large datasets. In order to address these scalability issues and to enable visual data mining of massive sets of high dimensional data (or so-called "big data"), simplification methods are needed, so as to select and/or summarize important dimensions and/or objects.
Traditionally, two scientific communities developed tools to address these problems: the machine learning (ML) and information visualization (IV) communities. On the one hand, ML provides a collection of automated data summarizing/compression solutions. Clustering algorithms summarize a set of objects with a smaller set of prototypes, while projection algorithms reduce the dimensionality of objects described by high-dimensional vectors. On the other hand, the IV community has developed user-centric and interactive methods to handle the human vision scalability issue.
Building upon seminar 12081, the present seminar aimed at understanding key challenges such as interactivity, quality assessment, platforms and software, and others.
Organization
The seminar was organized in order to maximize discussion time and in a way that avoided a conference like program with classical scheduled talks. After some lightning introduction by each participant, the seminar began with two tutorial talks one about machine learning (focused on visualization related topics) followed by another one about information visualization. Indeed, while some attendants of the present seminar participated to seminar 12081, most of the participants did not. The tutorials helped establishing some common vocabulary and giving an idea of ongoing research in ML and IV.
After those talks, the seminar was organized in parallel working groups with periodic plenary meeting and discussions, as described below.
Topics and groups
After the two tutorials, the participants spend some time identifying topics they would like to discuss during the seminar. Twenty one emerged:
- Definition and analysis of quantitative evaluation measures for dimensionality reduction (DR) methods (and for other methods);
- In the context of dimensionality reduction: visualization of quality measures and of the sensitivity of some results to user inputs;
- What IV tasks (in addition to DR related tasks) could benefit from ML? What ML tasks could benefit from IV?
- Reproducible/stable methods and the link of those aspects to sensitivity and consensus results;
- Understanding the role of the user in mixed systems (which include both a ML and an IV component);
- Interactive steerable ML methods (relation to intermediate results);
- Methods from both fields for dynamic multivariate networks;
- ML methods that can scale up to IV demands (especially in terms of interactivity);
- Interpretable/transparent decisions;
- Uncertainty;
- Matching vocabularies/taxonomies between ML and IV;
- Limits to ML;
- Causality;
- User guidance: precalculating results, understanding user intentions;
- Mixing user and data driven evaluation (leveraging a ROC curve, for instance);
- Privacy;
- Applications and use cases;
- Prior knowledge integration;
- Formalizing task definition;
- Usability;
- Larger scope ML.
After some clustering and voting those topics were merged into six popular broader subjects which were discussed in working groups through the rest of the week:
- Dynamic networks
- Quality
- Emerging tasks
- Role of the user
- Reproducibility and interpretability
- New techniques for Big Data
The rest of the seminar was organized as a series of meeting in working groups interleaved with plenary meetings which allowed working groups to report on their joint work, to steer the global process, etc.
Conclusion
As reported in the rest of this document, the working groups were very productive as was the whole week. In particular, the participants have identified a number of issues that mostly revolve around complex systems that are being built for visual analytics. Those systems need to be scalable, they need to support rich interaction, steering, objective evaluation, etc. The results must be stable and interpretable, but the system must also be able to include uncertainty into the process (in addition to prior knowledge). Position papers and roadmaps have been written as a concrete output of the discussions on those complex visual analytics systems.
The productivity of the week has confirmed that researchers from information visualization and from machine learning share some common medium to long term research goals. It appeared also clearly that there is still a strong need for a better understanding between the two communities. As such, it was decided to work on joint tutorial proposals for upcoming IV and ML conferences. In order to facilitate the exchange between the communities outside of the perfect conditions provided by Dagstuhl, the blog "Visualization meets Machine Learning" was initiated.
It should be noted finally that the seminar was very appreciated by the participants as reported by the survey. Because of the practical organization of the seminar, participants did not know each other fields very well and it might have been better to allows slightly more time for personal introduction. Some open research questions from each field that seems interesting to the other fields could also have been presented. But the positive consequences of avoiding a conference like schedule was very appreciated. The participants were pleased by the ample time for discussions, the balance between the two communities and the quality of the discussions. Those aspects are quite unique to Dagstuhl.
References
- Daniel A. Keim, Fabrice Rossi, Thomas Seidl, Michel Verleysen, and Stefan Wrobel. Dagstuhl Manifesto: Information Visualization, Visual Data Mining and Machine Learning (Dagstuhl Seminar 12081). Informatik-Spektrum, 35:58–83, 8 2012.
- Daniel A. Keim, Fabrice Rossi, Thomas Seidl, Michel Verleysen, and Stefan Wrobel, (editors). Information Visualization, Visual Data Mining and Machine Learning (Dagstuhl Seminar 12081), Dagstuhl Reports, 2(2):58–83, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2012. http://dx.doi.org/10.4230/DagRep.2.2.58
- Daniel Archambault (Swansea University, GB) [dblp]
- Francois Blayo (Ipseite SA - Lausanne, CH) [dblp]
- Kerstin Bunte (UC Louvain-la-Neuve, BE) [dblp]
- Miguel Á. Carreira-Perpiñán (University of California - Merced, US) [dblp]
- Ignacio Díaz Blanco (University of Oviedo, ES) [dblp]
- David S. Ebert (Purdue University - West Lafayette, US) [dblp]
- Alex Endert (Georgia Institute of Technology, US) [dblp]
- Thomas Ertl (Universität Stuttgart, DE) [dblp]
- Barbara Hammer (Universität Bielefeld, DE) [dblp]
- Helwig Hauser (University of Bergen, NO) [dblp]
- Stephen Ingram (University of British Columbia - Vancouver, CA) [dblp]
- Samuel Kaski (Aalto University, FI) [dblp]
- Daniel A. Keim (Universität Konstanz, DE) [dblp]
- Bongshin Lee (Microsoft Research - Redmond, US) [dblp]
- John Aldo Lee (UC Louvain-la-Neuve, BE) [dblp]
- Bassam Mokbel (Universität Bielefeld, DE) [dblp]
- Torsten Möller (Universität Wien, AT) [dblp]
- Tamara Munzner (University of British Columbia - Vancouver, CA) [dblp]
- Ian Nabney (Aston University - Birmingham, GB) [dblp]
- Stephen North (Infovisible - Oldwick, US) [dblp]
- Eli Parviainen (Aalto University, FI) [dblp]
- Fernando Paulovich (University of Sao Paulo, BR) [dblp]
- Jaakko Peltonen (Aalto University, FI & University of Tampere, FI) [dblp]
- William Ribarsky (University of North Carolina - Charlotte, US) [dblp]
- Fabrice Rossi (University of Paris I, FR) [dblp]
- Frank-Michael Schleif (University of Birmingham, GB) [dblp]
- Michael Sedlmair (Universität Wien, AT) [dblp]
- Cagatay Turkay (City University - London, GB) [dblp]
- Jarke J. van Wijk (TU Eindhoven, NL) [dblp]
- Michel Verleysen (University of Louvain, BE) [dblp]
- Thomas Villmann (Hochschule Mittweida, DE) [dblp]
- Daniel Weiskopf (Universität Stuttgart, DE) [dblp]
- William Wong (Middlesex University, GB) [dblp]
- Jing Yang (University of North Carolina - Charlotte, US) [dblp]
- Leishi Zhang (Middlesex University, GB) [dblp]
- Blaz Zupan (University of Ljubljana, SI) [dblp]
Verwandte Seminare
- Dagstuhl-Seminar 12081: Information Visualization, Visual Data Mining and Machine Learning (2012-02-19 - 2012-02-24) (Details)
Klassifikation
- computer graphics / computer vision
- data bases / information retrieval
- soft computing / evolutionary algorithms
Schlagworte
- Information visualization
- machine learning