Dagstuhl Seminar 22381
Rational Design of RiboNucleic Acids
( Sep 18 – Sep 23, 2022 )
Permalink
Organizers
- Sven Findeiß (Universität Leipzig, DE)
- Christoph Flamm (Universität Wien, AT)
- Yann Ponty (Ecole Polytechnique - Palaiseau, FR)
Contact
- Andreas Dolzmann (for scientific matters)
- Susanne Bach-Bernhard (for administrative matters)
Schedule
Context and selected takeaways
RiboNucleic Acids (RNAs) are ubiquitous macromolecules within biological systems, capable of performing a wide range of regulatory and catalytic functions. This versatility can be harnessed, and RNAs are increasingly utilized to accurately monitor and control biological processes [19], leading to RNA being found at the core of modern therapeutics [18]. It is therefore not surprising that the RNA-guided CRISPR-Cas9 editing [10], rewarded by the 2020 Nobel Prize in Chemistry, and mRNA-based vaccines [12], are at the forefront of modern biotechnology. For many functional RNA families [11], decades of research have produced a deep understanding of the sequence and structural basis underlying their biological function(s). Such studies, coupled with mature computational methods for structure prediction [23], have paved the way for a rational design of RNAs targeting a wide diversity of biological function [8, 2, 13].
Accordingly, RNA design has emerged as an exciting open computational problems in molecular biology. Owing to the discrete nature of RNA sequence and popular structural representations (e.g. secondary structure), RNA design has inspired the contribution of a large number of diverse algorithms [9, 20, 14, 4] for the inverse folding problem, i.e. the design of an RNA sequence which preferentially and effectively folds into a predefined (secondary) structure. Given the, recently established, NP-Hardness of the problem, even for minimal energy models [1], many of those algorithmic predictions are either heuristics, exponential-time or based on a variety of machine learning techniques.
More generally, RNA design addresses the generation of sequences of nucleotides targeting a given biological function. A non-exhaustive list of classic design objectives includes:
- Preferential adoption of one or multiple given structures (inverse folding);
- Sequence specific constraints such as an overall (di-)nucleotide composition [21], encoding of an amino-acid sequence (mRNA design), presence/absence of motifs [22];
- Adoption of different conformations upon presence of ligand (RNA switches and sensors) [3];
- Effective and specific interactions with targeted partners (RNAs, proteins) cascading into system-level regulatory effects [15, 16];
- Self-assembly into large scale architectures, ultimately adopting a predefined 3D shape (RNA origami) [6];
- Exploit co-transcriptional folding, and more general out-of-equilibrium regimes to perform computations (strand displacement systems, oritatami) [5]
Typical applications of design include novel therapeutic strategies, control principles for existing biological systems, or sensors for the presence of small molecules [3], but designed sequences can also provide an objective experimental assessment of functional hypotheses, where designs are synthesized and their effect on the cellular context can be tested in vitro and, in turn, in vivo.
Over the course of the seminar, we witnessed a substantial recent expansion of the scope of applications. Beyond classic but still challenging objectives of design, including riboswitches addressed by Talk 5.8, 5.9 and 5.21, messenger RNAs towards vaccine objectives mentioned by Talk 5.27, and CRISPR gRNAs mentioned by Talk 5.11, novel applications of RNA design emerged during the seminar. Talk 5.11 introduced SARS CoV 2 sensors based on strand displacement, Talk 5.15 addressed self-replicating ribozymes connected with origin-or-life questions, and Talk 5.6 explored rational design principles for repetitive RNAs inducing the formation of cellular droplets through liquid-liquid phase separation.
RNA Design as a discrete (inverse) optimization problem
The inverse folding problem, one of the central elements of RNA design, is a hard computational problem [1]. Although attracting a wide interest from the community, it is also one of the very few problems in computational biology whose complexity status has remained open for a long time (about three decades). This difficulty can be attributed to a lack of a suitable conceptual framework for inverse combinatorial problems. Indeed, inverse folding can be viewed as the search of a pre-image, in a function that maps each RNA sequence to its most stable conformation, the latter being computed using a polynomial, yet non-trivial, dynamic programming algorithm [23]. Natural generalizations virtually include any instance of inverse optimization problems, and could be of general interest to the Computer Science research field. Prior works in this direction have led to characterization of designable structures based on formal languages and graph theory [7], revealing strong connections to many subfields of computer science (for instance, between positive design and graph coloring).
In Talk 5.18 it was discussed that a flexible inverse folding approach, e.g. by allowing the extension of helices by at most one base pair, seems to be easier than keeping the problem strict. Such a flexibility in the structural objective of design was also emphasized as desirable by Talk 5.5. The problem of classical inverse folding can be extended from one to multiple target structures, and Talk 5.27 showed that this can be solved by an elegant dynamic programming approach that is fixed parameter tractable. The resulting framework was further generalized, and is not only applicable to RNA design, but also to apparently more distant problems such as the alignment of RNAs with pseudoknots. In silico designs and analysis depend on the accuracy of the applied energy model. In Talk 5.12 it has been underpinned that a systematic perturbation of parameters can be used to define a notion of robustness of individual parameters of an energy model, and help to improve prediction accuracy. Talk 5.28 revisited the inverse folding problem as an inverse optimization problem, and showed that many local structural motifs do not admit a design, with consequences to the space of designable structures, but raised fundamental questions on a relatively new flavor of optimization.
RNA Design in Structural Bioinformatics
Inverse folding also represents the ultimate test of our understanding of the mechanisms governing the folding of macromolecules. Given a set of folding rules (typically, an energy model), a synthesis of in silico designed sequences combined with high-throughput experiments (e.g., structure probing) enables an assessment of the compatibility of the determined structure with the initial target. Observed discrepancies can then be used to assess the quality of predictive models, especially those based on statistical potentials which may be prone to overfitting. Systematic local imprecisions can also be used to refine energy models, enabling the generation of better designs, whose iteration represents a virtuous circle, ultimately contributing to a better understanding of folding principles.
A nascent RNA molecule typically folds during its transcription. Frameworks to simulate this kinetically driven process can help to interpret experimental results (Talk 5.4) but as neither the simulation nor the experiment is perfect, quality assurance (Talk 5.14) of the in silico investigations is essential and results have to be interpret with caution, as for instance the mapping of time scales is a non trivial task. Finally, complex RNA hybridization networks are designed in silico to perform regulatory functions with complex temporal dynamics. A simplified kinetic model, introduced in Talk 5.26, for RNA/RNA hybridization represents an attractive evaluation model for the design of interactions.
At a much more detailed 3D level, Talk 5.17 showed that high-resolution experimental techniques can be used to observe dynamic behaviors, sometimes triggered by the binding of a ligand, and could inform future objective functions. Talk 5.24 and 5.16 described coarsegrained models amenable to molecular dynamics. Interestingly, the latter can be leveraged in order to study kinetics behaviors at the 3D level.
RNA Design in Synthetic Biology and Natural Computing
This line of research applies various engineering principles to the design, and construction of artificial biological devices. While initially focused on hijacking naturally-occurring regulatory functions through a copy/paste of evolved genetic material [17], the need for a precise control and for a modularity/orthogonality of constructs, has increasingly led to a de novo design based on nucleic acids. Recently, RNA has been successfully used as a material for the design of whole regulatory circuits, or for the construction of complex programmable shapes (RNA origamis [6]), with promising applications as biomaterial.
Software frameworks, like the ones presented in Talk 5.10 and Talk 5.22, make the construction of large DNA and RNA nano-structures possible. Those designs are not only adopting the right structure in silico according to combinatorial folding algorithms, but can also be validated by simulations (Talk 5.24) and microscopy (Talk 5.1). This observation suggests that the difficulty of design could stem from the compactness of targeted ncRNAs, while larger (but more regular) RNAs may be easier to design, an element that could inspire future theoretical studies.
In the course of the seminar it became evident that information from the 2D and 3D level need to be mapped onto each other. Design could therefore benefit from multiscale approaches: selecting candidates with 2D objectives, use coarse grained 3D analysis (Talk 5.16) and go to a full atom final validation for critical sub-regions. The curation of refined and non-redundant 3D RNA structures (Talk 5.2) and the systematic extraction of information from such a data set can help to investigate for instance structural features of modified bases or to propose isosteric structural mutations (Talk 5.19) in order to generalize the design from 2D to compact 3D architectures.
Programmable RNA folding can also be used as a computational model, allowing for the computation of complex programs based on cotranscriptional folding phenomena. RNA regulatory circuits can be used to emulate Boolean functions, allowing a precise and expressive control of regulatory networks at an early stage of the gene expression process. Talk 5.23 introduced RNA oritatami, a Turing-complete model of computation based on cotranscriptional folding inspired by cellular automata. Talk 5.7 described exciting applications of design to generate easily-checkable QR codes that reveal contamination in a closed environment. However an application-agnostic implementation of the strand displacement systems underlying some of those applications still represent major challenges in RNA, as shown and discussed during Talk 5.3. Those include intra-molecular base pairs and an overall wasteful behavior that motivates efficient recycling strategies.
RNA evolution and Machine Learning
The analysis of new RNA families, such as the pervasive and poorly understood lncRNAs or the numerous viral/bacterial non-coding RNAs observed in metagenomics experiments, relies critically on the identification of an evolutionary pressure, allowing to hypothesize new functions. Given a family of homologous RNAs sharing established functional traits, it is classic to asks whether an observed property, such as the occurrence of a common motif or a given covariation pattern, is likely to reveal a yet-unknown selective pressure or, conversely, is merely the consequence of established functional traits. Classic bioinformatics methods rephrase the problem in a hypothesis-testing framework, and compute the probability that a sequence, generated at random in a model that captures existing constraints, features the observed property. Ideally, such sequences should represent solutions to an instance of the design problem, target established functions, while respecting a distribution that can either be derived from the targeted function, or learned from data.
Talk 5.5 presented a context where rational design methodologies were utilized to capture remote homologs of a suspected, but scarcely-populated, functional family of ncRNAs. Generative models can also be used for design, in cases where the underlying model of function is partially understood, and should be learned from the data. Talk 5.8 used Restricted Boltzmann Machines (RBM), an unsupervised learning approach, to pickup the intricate probability distribution of the statistical features of a naturally-occurring riboswitch. The RBM was then used to generate novel instances with the same distribution of features, resulting in an enrichment of functional designs as revealed by experimental validation. Direct Coupling Analysis was also used in Talk 5.15 to generate self-replicating ribozymes, using a complex definition of function that may require some element of learning. Interestingly, the efficacy of designs was ultimately shown to benefit from further refinements using classic combinatorial methods for inverse folding, suggesting future hybrid ML/combinatorial methodologies.
While RNA design is an increasingly important computational task in molecular biology, nanotechnology and medicine, methods for computational rational design are still lacking for many applications. Moreover, many design tasks are currently addressed using algorithmic techniques (e.g. Markov chain Monte Carlo) that are clearly superseded by the state-of-theart in algorithmic research. Conversely, computer scientists considering design tasks usually limit themselves to inverse folding, overlooking a rich bestiary of computational problems whose consideration would, in turn, undoubtedly lead to the emergence of new algorithmic paradigms.
Talk 5.20 showed that basic ML architectures can be learned in the context of reinforcement learning and can be successful for basic inverse folding of RNA. Talk 5.25 presented a complete design story, describing a methodology to advance our understanding of tRNAs. In particular, a mechanistic understanding of the target function can be gained by masking constraints during redesign, and the differentiability of the design problem can lead to great speedups of the computation. However, ML approaches may not always represent a silver bullet in RNA bioinformatics, and Talk 5.13 dramatically illustrated this in the context of RNA folding, a context where the quality and biases in the data strongly impacts, and probably hinders for intrinsic reasons, the predictive capabilities of deep learning-based methods. As a consequence, synthetic data can and should be used to test the capacity of learning architectures on simplified problems before embarking into “real life” learning.
References
- Édouard Bonnet, Pawel Rzazewski, and Florian Sikora. Designing RNA secondary structures is hard. In Benjamin J. Raphael, editor, Research in Computational Molecular Biology - 22nd Annual International Conference, RECOMB 2018, volume 10812 of Lecture Notes in Computer Science, pages 248–250, Paris, 2018. Springer.
- Matthew G. Costales, Jessica L. Childs-Disney, Hafeez S. Haniff, and Matthew D. Disney. How we think about targeting RNA with small molecules. Journal of Medicinal Chemistry, 63(17):8880–8900, 2020. PMID: 32212706.
- Sven Findeiß, Maja Etzel, Sebastian Will, Mario Mörl, and Peter F Stadler. Design of artificial riboswitches as biosensors. Sensors (Basel, Switzerland), 17(9):E1990, August 2017.
- Juan Antonio Garcia-Martin, Peter Clote, and Ivan Dotu. RNAiFOLD: a constraint programming algorithm for RNA inverse folding and molecular design. Journal of Bioinformatics and Computational Biology, 11(2):1350001, 2013.
- Cody Geary, Pierre-Étienne Meunier, Nicolas Schabanel, and Shinnosuke Seki. Oritatami: A Computational Model for Molecular Co-Transcriptional Folding. International Journal of Molecular Sciences, 20(9):2259, May 2019.
- Cody Geary, Paul W. K. Rothemund, and Ebbe S. Andersen. A single-stranded architecture for cotranscriptional folding of RNA nanostructures. Science, 345(6198):799–804, 2014.
- Jozef Hales, Alice Héliou, Ján Manuch, Yann Ponty, and Ladislav Stacho. Combinatorial RNA Design: Designability and Structure-Approximating Algorithm in Watson-Crick and Nussinov-Jacobson Energy Models. Algorithmica, 79(3):835–856, Nov 2017.
- Stefan Hammer, Christian Günzel, Mario Mörl, and Sven Findeiß. Evolving methods for rational de novo design of functional RNA molecules. Methods, 161:54–63, may 2019.
- I.L. Hofacker, W. Fontana, P.F. Stadler, L.S. Bonhoeffer, M. Tacker, and P. Schuster. Fast folding and comparison of RNA secondary structures. Monatsch. Chem., 125:167–188, 1994.
- Martin Jinek, Krzysztof Chylinski, Ines Fonfara, Michael Hauer, Jennifer A. Doudna, and Emmanuelle Charpentier. A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science, 337(6096):816–821, 2012.
- Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, and Anton I Petrov. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic acids research, 46:D335–D342, January 2018.
- David M. Mauger, B. Joseph Cabral, Vladimir Presnyak, Stephen V. Su, David W. Reid, Brooke Goodman, Kristian Link, Nikhil Khatwani, John Reynders, Melissa J. Moore, and Iain J. McFadyen. mRNA structure regulates protein expression through changes in functional half-life. Proceedings of the National Academy of Sciences, 116(48):24075–24083, 2019.
- Na Qu, Yachen Ying, Jinshan Qin, and Antony K Chen. Rational design of self-assembled RNA nanostructures for HIV-1 virus assembly blockade. Nucleic Acids Research, 50(8):e44– e44, 12 2021.
- Vladimir Reinharz, Yann Ponty, and Jérôme Waldispühl. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Bioinformatics, 29(13):i308–i315, 2013.
- Guillermo Rodrigo, Thomas E Landrain, and Alfonso Jaramillo. De novo automated design of small RNA circuits for engineering synthetic riboregulation in living cells. Proceedings of the National Academy of Sciences U S A, 109(38):15271–6, 2012.
- Guillermo Rodrigo, Thomas E. Landrain, Eszter Majer, José-Antonio Daròs, and Alfonso Jaramillo. Full design automation of multi-state RNA devices to program gene expression using energy-based optimization. PLoS Computational Biology, 9(8):e1003172, 2013.
- E Westhof, B Masquida, and L Jaeger. RNA tectonics: towards RNA design. Folding & design, 1(4):R78—88, 1996.
- Melanie Winkle, Sherien M. El-Daly, Muller Fabbri, and George A. Calin. Noncoding RNA therapeutics – challenges and potential solutions. Nat Rev Drug Discov, 20:629–651, 2021.
- Sherry Y. Wu, Gabriel Lopez-Berestein, George A. Calin, and Anil K. Sood. RNAi therapies: Drugging the undruggable. Science Translational Medicine, 6(240):240ps7, 2014.
- Joseph N Zadeh, Brian R Wolfe, and Niles A Pierce. Nucleic acid sequence design via efficient ensemble defect optimization. Journal of Computational Chemistry, 32(3):439–52, 2011.
- Yang Zhang, Yann Ponty, Mathieu Blanchette, Eric Lecuyer, and Jérôme Waldispühl. SPARCS: a web server to analyze (un)structured regions in coding RNA sequences. Nucleic Acids Research, 41(Web Server issue):W480–5, July 2013.
- Yu Zhou, Yann Ponty, Stéphane Vialette, Jérome Waldispühl, Yi Zhang, and Alain Denise. Flexible RNA design under structure and sequence constraints using formal languages. In Jing Gao, editor, ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics. ACM-BCB 2013, Washington, DC, USA, September 22-25, 2013, page 229. ACM, 2013.
- M. Zuker and P. Stiegler. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Research, 9:133–148, 1981.
The Dagstuhl Seminar aims to bring together Computer Scientists, Biologists, Biochemists, and Mathematicians with strong interest, and recent research activities, in the process and outcome of the rational design of RiboNucleic Acids (RNAs). Rational design is here to be understood in its broadest possible definition, and extends beyond the well-studied preferential adoption of a specified (secondary) structure. It particularly includes the design of mutants, multi-stable RNAs, protein-encoding mRNAs, and circuits of interacting RNAs.
Accordingly, design encompasses a variety of computational and experimental tasks that are essential to many areas of molecular biology, system biology, structural biology, synthetic biology and modern therapeutics. Computationally, design tasks represent a relatively unexplored instance of inverse combinatorial optimization, subject to objective functions that urgently need to capture complex underlying realities.
To address the future needs and challenges in those areas a fruitful crosstalk between experimentalists, modelers, and providers of computational methods is required. A non-exhaustive list of key questions to be addressed by the seminar includes:
- Which constructs can be designed for validating/refining functional models and to disentangle errors?
- How to capture complex structural aspects (pseudoknots, kinetics, 3D, interactions...) and how to experimentally validate structural features of designed constructs?
- Can modern techniques in machine learning be used to design RNAs?
- Are the capacities of predictive computational methods sufficient to define suitable objective functions?
- Should rational design be statistical, e.g., to be used as a background model for comparative studies?
- What are the computational and combinatorial properties of design related problems?
The overarching goal of this seminar is to establish a comprehensive repertoire of existing computational methods, and to identify novel computational needs to fuel the development of various areas of biology.
In addition to presentations of recent research, the Dagstuhl Seminar will include expository talks presenting existing available methodologies, and surveys of key biological questions where design tasks are central. Opportunities will be provided for the discussion of open problems and other topics of shared interest.
- Ebbe Sloth Andersen (Aarhus University, DK) [dblp]
- Maciej Antczak (Poznan University of Technology, PL)
- Stefan Badelt (Universität Wien, AT)
- Danny Barash (Ben Gurion University - Beer Sheva, IL) [dblp]
- Sarah Berkemer (Ecole Polytechnique - Palaiseau, FR) [dblp]
- Anne Condon (University of British Columbia - Vancouver, CA) [dblp]
- Harold Fellermann (Newcastle University, GB) [dblp]
- Jorge Fernández de Cossío Díaz (ENS - Paris, FR)
- Sven Findeiß (Universität Leipzig, DE) [dblp]
- Christoph Flamm (Universität Wien, AT) [dblp]
- Cody Geary (Aarhus University, DK)
- Jan Gorodkin (University of Copenhagen, DK) [dblp]
- Christine Heitsch (Georgia Institute of Technology - Atlanta, US) [dblp]
- Ivo Hofacker (Universität Wien, AT) [dblp]
- Felix Kühnl (Universität Leipzig, DE)
- István Miklós (ELKH - Budapest, HU) [dblp]
- Philippe Nghe (ESPCI - Paris, FR) [dblp]
- Cyrille Merleau Nono Saha (MPI für Mathematik in den Naturwissen. - Leipzig, DE)
- Samuela Pasquali (University Paris-Diderot, FR)
- Katja Petzold (Karolinska Institute - Stockholm, SE)
- Yann Ponty (Ecole Polytechnique - Palaiseau, FR) [dblp]
- Vladimir Reinharz (University of Montreal, CA) [dblp]
- Lorenz Ronny (Universität Wien, AT) [dblp]
- Frederic Runge (Universität Freiburg, DE) [dblp]
- Bruno Sargueil (Paris Descartes University, FR)
- Nicolas Schabanel (ENS - Lyon, FR)
- Shinnosuke Seki (The University of Electro-Communications - Tokyo, JP) [dblp]
- Petr Sulc (Arizona State University - Tempe, US) [dblp]
- Marta Szachniuk (Poznan University of Technology, PL) [dblp]
- Andrew Torda (Universität Hamburg, DE)
- Maria Waldl (Universität Wien, AT)
- Sebastian Will (Ecole Polytechnique - Palaiseau, FR) [dblp]
- Hua-Ting Yao (Universität Wien, AT) [dblp]
Classification
- Data Structures and Algorithms
- Other Computer Science
Keywords
- RiboNucleic Acids
- Stuctural Design
- Thermodynamics