Dagstuhl Seminar 22191
Visual Text Analytics
( May 08 – May 13, 2022 )
Permalink
Organizers
- Christopher Collins (Ontario Tech - Oshawa, CA)
- Antske Fokkens (Free University Amsterdam, NL)
- Andreas Kerren (Linköping University, SE)
- Chris Weaver (University of Oklahoma - Norman, US)
Contact
- Marsha Kleinbauer (for scientific matters)
- Jutka Gasiorowski (for administrative matters)
Impacts
- An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics : Position Paper : article to appear in Proceedings of the 2022 IEEE Workshop on Evaluation and Beyond - Methodological Approaches to Visualization (BELIV '22) - Kucher, Kostiantyn; Sultanum, Nicole; Daza, Angel; Simaki, Vasiliki; Skeppstedt, Maria; Mahyar, Narges; Fekete, Jean-Daniel; Plank, Barbara - HAL Inria, 2022. - 11 pp..
- An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics : Position Paper 2022 : IEEE Evaluation and Beyond - Methodological Approaches for Visualization (BELIV) - Kucher, Kostiantyn; Sultanum, Nicole; Daza, Angel; Simaki, Vasiliki; Skeppstedt, Maria; Mahyar, Narges; Fekete, Jean-Daniel; Plank, Barbara - Los Alamitos : IEEE, 2022. - pp. 28-37.
- The Role of Interactive Visualization in Explaining (Large) NLP Models : from Data to Inference - Brath, Richard; Keim, Daniel A.; Knittel, Johannes; Pan, Shimei; Sommerauer, Pia; Strobelt, Hendrik - Cornell University : arXiv.org, 2023. - 12 pp..
- Characterizing Uncertainty in the Visual Text Analysis Pipeline : article in 2022 IEEE 7th Workshop on Visualization for the Digital Humanities (VIS4DH) - Haghighatkhah, Pantea; El-Assady, Mennatallah; Fekete, Jean-Daniel; Mahyar, Narges; Paradis, Carita; Speckmann, Bettina; Simaki, Vasiliki - Los Alamitos : IEEE, 2022. - 6 pp..
- From word clouds to Word Rain : Revisiting the classic word cloud to visualize climate change texts - Skeppstedt, Maria; Ahltorp, Magnus; Kucher, Kostiantyn; Lindström, Matts - Thousand Oaks : Sage Science Press, 2024. - 22 pp. - (Information Visualization ; 2024).
Schedule
Introduction
Visualizing textual information is a particularly challenging area of information visualization and visual analytics research. The types of data processing and analytic algorithms differ greatly from tabular or geospatial data, and the visualization techniques have additional constraints to consider, including the provision of context for text fragments of similar or different size and structure, depicting embeddings and high dimensional representations, and ensuring legibility of text incorporated into visualizations. The wide variation in the data is accompanied by the difficulties in inferring the semantic meaning of ambiguous terms, or determining the referencing between subsequent statements.
This Dagstuhl Seminar succeeded in bringing together researchers from the visualization, natural language processing (NLP), and machine learning communities, with domain experts from several text-related research areas, to identify the most pressing and promising open problems for collaborative research. This truly interdisciplinary approach offered new opportunities to capitalize on existing knowledge and recent developments across all involved disciplines. Discussions in the seminar were comprehensive, focusing on visual text analytics with the goal to provide an application-oriented research agenda.
The seminar coalesced an international community of experts from different disciplines around a research roadmap for the next 5–10 years, as documented through working group reports. The seminar generated a series of research questions which serve as a call to action to the wider community. The unique and contained setting of Schloss Dagstuhl facilitated new cross-disciplinary collaborations and allowed us to lay the groundwork for productive future collaborations, including a planned special issue of the Information Visualization journal.
Seminar Themes
The following high-level themes were discussed during the seminar. The seminar allowed attendees to critically reflect on current research efforts, the state of field, and key research challenges today. Participants also were encouraged to demonstrate their system prototypes and tools relevant to the seminar topics. As a result of the first working groups, as well as impromptu demonstrations and discussions, the actual seminar discussion topics evolved and we established a second set of working groups halfway through the week, cf. Sect. 6.
- Data Sources and Diversity What is the current landscape of the application fields and data domains? What are the data gaps? Can existing approaches be generalized?
- Model Explainability and Interpretability Can we provide more sophisticated visualizations to study how language models learn or what information they represent?
- Evaluation and Experimental Designs Which experimental methods best support the evaluation of techniques and processes for visualizing text information?
- Interaction Design What design opportunities are unique to, or more pressing, for text data? How can interaction principles be applied to any underlying NLP as well?
- Toolkits and Standards What success stories regarding existing text visualization approaches and systems can we learn from? What is needed?
- TextVis Literacy Visual text analytics can be applied across a wide variety of domains. How do we make techniques easy to learn and to interpret correctly?
Outcomes
The Dagstuhl team performed an evaluation at the end of the seminar week. The results of this survey (scientific quality, inspiration to new ideas/projects/research/papers, insights from neighboring fields, ...) were universally very good to excellent. Only a few single improvements were proposed by participants, for example, having longer breaks and mixing up the demo presentations with the other parts of the schedule. Another suggestion was to skip the intermediate group report session because it interrupted the group work.
At the end of the week the organizers agreed to proceed to arrange for a special issue of the journal Information Visualization, which will have an open call but with the intent to include any extended works resulting from the seminar. In addition, several working groups with more "position paper" style reports planed to submit these to well-read venues accepting of editorial works which motivate the research community.
Remaining Challenges in Visual Text Analytics
Not all topics identified during the seminar could be addressed in the working groups and might be left for a future Dagstuhl seminar on a similar subject area. In the following, we briefly list those topics and open problems (more are surely existing that are not mentioned here):
- Interaction Design: Interaction methodologies as part of any visual text analytics approach were in the focus of several working groups. A more systematic classification and evaluation of interaction techniques that are unique for text data would be useful for future developments.
- Toolkits and Standards: Even if many toolkits and existing standards were discussed in the seminar, a proper and comprehensive analysis of those is still missing that would be beneficial for users and developers of visual text analytics systems.
- TextVis Literacy: This topic is important to broaden the use of visual text analytics techniques in general and should be studied deeper in the future.
- Focus on Text Data Aspects: The consideration of data diversity, data fusion, and data organization in context of visual text analytics might be an interesting topic for further discussion.
- Focus on Specific NLP and ML Methods: The increasing number of specific/novel analytical methods (such as transfer learning or others) raise the need for specific answers from the visual text analytics community.
Acknowledgments
We would like to thank all participants of the seminar for the lively discussions and contributions during the seminar as well as the scientific directorate of Dagstuhl Castle for giving us the possibility of organizing this event. Angelos Chatzimparmpas gathered the abstracts for the overview of the invited talks, the tool demos, and the working groups in Sect. 4, Sect. 5, and Sect. 6, respectively. Once more, we are thankful to all the attendees for agreeing to compose the abstract texts and timely provide them to us in order to write this executive summary. Last but not least, the seminar would not have been possible without the great help of the staff at Dagstuhl Castle. We acknowledge all of them and their assistance.
Visualizing textual information is a particularly challenging area of information visualization and visual analytics research. The types of data processing and analytic algorithms differ greatly from tabular or geospatial data, and the visualization techniques have additional constraints to consider, including the provision of context for text fragments of similar or different size and structure, depicting embeddings and high dimensional representations, and ensuring legibility of text incorporated into visualizations. The wide variation in data is accompanied by the difficulties in inferring the semantic meaning of ambiguous terms, or determining the referencing between subsequent statements.
This Dagstuhl Seminar aims to bring together researchers from the visualization, natural language processing (NLP), and machine learning communities, with domain experts from several text-related research areas, to identify the most pressing and promising open problems for collaborative research. A truly interdisciplinary approach may offer new opportunities to capitalize on existing knowledge and recent developments across all involved disciplines. We will focus on a comprehensive discussion of visual text analytics, with a goal to provide an application-oriented research agenda. The main themes for the seminar cover theory, methodology, and application:
- Data Sources and Diversity What is the current landscape of the application fields and data domains? What are the data gaps? Can existing approaches be generalized?
- Model Explainability and Interpretability Can we provide more sophisticated visualizations to study how language models learn or what information they represent?
- Evaluation and Experimental Designs Which experimental methods best support the evaluation of techniques and processes for visualizing text information?
- Interaction Design What design opportunities are unique to, or more pressing, for text data? How can interaction principles be applied to any underlying NLP as well?
- Toolkits and Standards What success stories regarding existing text visualization approaches and systems can we learn from? What is needed?
- TextVis Literacy Visual text analytics can be applied across a wide variety of domains. How do we make techniques easy to learn and to interpret correctly?
The seminar will coalesce the community of experts from different disciplines around a research roadmap for the next 5–10 years. We aim to generate a series of research questions as a call to action by the wider community. The research discussed at the seminar will provide a deeper and more holistic understanding of challenges and opportunities in visual text analytics. The unique and contained setting of Schloss Dagstuhl will facilitate new collaborations and allow us to lay the groundwork for productive future collaborations.
- Richard Brath (Uncharted Software - Toronto, CA) [dblp]
- Angelos Chatzimparmpas (Linnaeus University - Växjö, SE)
- Christopher Collins (Ontario Tech - Oshawa, CA) [dblp]
- José Angel Daza Arévalo (Free University Amsterdam, NL)
- Mennatallah El-Assady (ETH Zürich, CH) [dblp]
- Alex Endert (Georgia Institute of Technology - Atlanta, US) [dblp]
- Jean-Daniel Fekete (INRIA Saclay - Orsay, FR) [dblp]
- Antske Fokkens (Free University Amsterdam, NL) [dblp]
- Yoav Goldberg (Bar-Ilan University - Ramat Gan, IL) [dblp]
- Pantea Haghighatkhah (TU Eindhoven, NL)
- Daniel A. Keim (Universität Konstanz, DE) [dblp]
- Andreas Kerren (Linköping University, SE) [dblp]
- Johannes Knittel (Universität Stuttgart, DE)
- Kostiantyn Kucher (Linnaeus University - Växjö, SE)
- Ross Maciejewski (Arizona State University - Tempe, US) [dblp]
- Narges Mahyar (University of Massachusetts - Amherst, US)
- Christofer Meinecke (Universität Leipzig, DE)
- Shimei Pan (University of Maryland - Baltimore County, US)
- Carita Paradis (Lund University, SE)
- Barbara Plank (IT University of Copenhagen, DK) [dblp]
- Vasiliki Simaki (Lund University, SE)
- Maria Skeppstedt (SE)
- Pia Sommerauer (Free University Amsterdam, NL)
- Bettina Speckmann (TU Eindhoven, NL) [dblp]
- Hendrik Strobelt (MIT-IBM Watson AI Lab - Cambridge, US) [dblp]
- Nicole Sultanum (University of Toronto, CA)
- Tatiana von Landesberger (Universität Köln, DE) [dblp]
- Chris Weaver (University of Oklahoma - Norman, US) [dblp]
Classification
- Computation and Language
- Graphics
- Human-Computer Interaction
Keywords
- Information Visualization
- Visual Text Analytics
- Visual Analytics
- Text Visualization
- Explainable ML for Text Analytics
- Language Models
- Text Mining
- Natural Language Processing