Bioinformatics
Institute of Computer Science
University Freiburg
de

Publications of the Group

Important notice:

Our publication list is currently being updated. To view the most recent list of our publications, please visit Google Scholar.

Available Lists


Overview 2023

Overview 2022

Overview 2021

Overview 2020

Overview 2019

Overview 2018

Overview 2017

Overview 2016

Overview 2015

Overview 2014

Overview 2013

Overview 2012

Overview 2011

Overview 2010

Overview 2009

Overview 2008

Overview 2007

Overview 2006

Overview 2005

Overview 2004

Overview 2003

Overview 2002

Overview 2001

Overview 2000

Overview 1999

Overview 1998

Overview 1997

Overview 1996

Overview 1995

Overview 1994

Overview 1993

Overview 1991

Overview 1990

Unraveling the small proteome of the plant symbiont Sinorhizobium meliloti by ribosome profiling and proteogenomics

Lydia Hadjeras, Benjamin Heiniger, Sandra Maass, Robina Scheuer, Rick Gelhausen, Saina Azarderakhsh, Susanne Barth-Weber, Rolf Backofen, Dorte Becher, Christian H. Ahrens, Cynthia M. Sharma, Elena Evguenieva-Hackenberg

In: Microlife, 2023, 4, uqad012

The soil-dwelling plant symbiont Sinorhizobium meliloti is a major model organism of Alphaproteobacteria. Despite numerous detailed OMICS studies, information about small open reading frame (sORF)-encoded proteins (SEPs) is largely missing, because sORFs are poorly annotated and SEPs are hard to detect experimentally. However, given that SEPs can fulfill important functions, identification of translated sORFs is critical for analyzing their roles in bacterial physiology. Ribosome profiling (Ribo-seq) can detect translated sORFs with high sensitivity, but is not yet routinely applied to bacteria because it must be adapted for each species. Here, we established a Ribo-seq procedure for S. meliloti 2011 based on RNase I digestion and detected translation for 60% of the annotated coding sequences during growth in minimal medium. Using ORF prediction tools based on Ribo-seq data, subsequent filtering, and manual curation, the translation of 37 non-annotated sORFs with

Available:
pdf (6169 KB)   doi:10.1093/femsml/uqad012   pmid:37223733   BibTeX Entry ( Hadjeras_Heiniger_Maass-Unrav_the_small-2023 )

Revealing the small proteome of Haloferax volcanii by combining ribosome profiling and small-protein optimized mass spectrometry

Lydia Hadjeras, Jurgen Bartel, Lisa-Katharina Maier, Sandra Maass, Verena Vogel, Sarah L. Svensson, Florian Eggenhofer, Rick Gelhausen, Teresa Muller, Omer S. Alkhnbashi, Rolf Backofen, Dorte Becher, Cynthia M. Sharma, Anita Marchfelder

In: Microlife, 2023, 4, uqad001

In contrast to extensively studied prokaryotic 'small' transcriptomes (encompassing all small noncoding RNAs), small proteomes (here defined as including proteins

Available:
pdf (0 KB)   doi:10.1093/femsml/uqad001   pmid:37223747   BibTeX Entry ( Hadjeras_Bartel_Maier-Revea_the_small-2023 )

The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond

Simon Bray, John Chilton, Matthias Bernt, Nicola Soranzo, Marius van den Beek, Berenice Batut, Helena Rasche, Martin Cech, Peter J. A. Cock, Bjorn Gruning, Anton Nekrutenko

In: Genome Res, 2023, 33(2), 261-268

There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.

Available:
pdf (2077 KB)   doi:10.1101/gr.276963.122   pmid:36828587   BibTeX Entry ( Bray_Chilton_Bernt-The_Plane_toolk-2023 )

Peakhood: individual site context extraction for CLIP-seq peak regions

Michael Uhl, Dominik Rabsch, Florian Eggenhofer, Rolf Backofen

In: Bioinformatics, 2022, 38(4), 1139-1140

MOTIVATION: CLIP-seq is by far the most widely used method to determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). The binding site locations are identified from CLIP-seq read data by tools termed peak callers. Many RBPs bind to a spliced RNA (i.e. transcript) context, but all currently available peak callers only consider and report the genomic context. To accurately model protein binding behavior, a tool is needed for the individual context assignment to CLIP-seq peak regions. RESULTS: Here we present Peakhood, the first tool that utilizes CLIP-seq peak regions identified by peak callers, in tandem with CLIP-seq read information and genomic annotations, to determine which context applies, individually for each peak region. For sites assigned to transcript context, it further determines the most likely splice variant, and merges results for any number of datasets to obtain a comprehensive collection of transcript context binding sites. AVAILABILITY AND IMPLEMENTATION: Peakhood is freely available under MIT license at: https://github.com/BackofenLab/Peakhood. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
pdf (433 KB)   doi:10.1093/bioinformatics/btab755   pmid:34734974   BibTeX Entry ( Uhl_Rabsch_Eggenhofer-Peakh_indiv_site-2022 )

The long noncoding RNA mimi scaffolds neuronal granules to maintain nervous system maturity

Dominika Grzejda, Jana Mach, Johanna Aurelia Schweizer, Barbara Hummel, Andrew Mischa Rezansoff, Florian Eggenhofer, Amol Panhale, Maria-Eleni Lalioti, Nina Cabezas Wallscheid, Rolf Backofen, Johannes Felsenberg, Valerie Hilgers

In: Sci Adv, 2022, 8(39), eabo5578

RNA binding proteins and messenger RNAs (mRNAs) assemble into ribonucleoprotein granules that regulate mRNA trafficking, local translation, and turnover. The dysregulation of RNA-protein condensation disturbs synaptic plasticity and neuron survival and has been widely associated with human neurological disease. Neuronal granules are thought to condense around particular proteins that dictate the identity and composition of each granule type. Here, we show in Drosophila that a previously uncharacterized long noncoding RNA, mimi, is required to scaffold large neuronal granules in the adult nervous system. Neuronal ELAV-like proteins directly bind mimi and mediate granule assembly, while Staufen maintains condensate integrity. mimi granules contain mRNAs and proteins involved in synaptic processes; granule loss in mimi mutant flies impairs nervous system maturity and neuropeptide-mediated signaling and causes phenotypes of neurodegeneration. Our work reports an architectural RNA for a neuronal granule and provides a handle to interrogate functions of a condensate independently of those of its constituent proteins.

Available:
pdf (2901 KB)   doi:10.1126/sciadv.abo5578   pmid:36170367   BibTeX Entry ( Grzejda_Mach_Schweizer-The_long_nonco-2022 )

High detection rate for disease-causing variants in a cohort of 30 Iranian pediatric steroid resistant nephrotic syndrome cases

Maryam Najafi, Korbinian M. Riedhammer, Aboulfazl Rad, Paria Najarzadeh Torbati, Riccardo Berutti, Isabel Schule, Sophie Schroda, Thomas Meitinger, Jasmina Comic, Simin Sadeghi Bojd, Tayebeh Baranzehi, Azadeh Shojaei, Anoush Azarfar, Mahmood Reza Khazaei, Anna Kottgen, Rolf Backofen, Ehsan Ghayoor Karimiani, Julia Hoefele, Miriam Schmidts

In: Front Pediatr, 2022, 10, 974840

BACKGROUND: Steroid resistant nephrotic syndrome (SRNS) represents a significant renal disease burden in childhood and adolescence. In contrast to steroid sensitive nephrotic syndrome (SSNS), renal outcomes are significantly poorer in SRNS. Over the past decade, extensive genetic heterogeneity has become evident while disease-causing variants are still only identified in 30% of cases in previously reported studies with proportion and type of variants identified differing depending on the age of onset and ethnical background of probands. A genetic diagnosis however can have implications regarding clinical management, including kidney transplantation, extrarenal disease manifestations, and, in some cases, even causal therapy. Genetic diagnostics therefore play an important role for the clinical care of SRNS affected individuals. METHODOLOGY AND RESULTS: Here, we performed NPHS2 Sanger sequencing and subsequent exome sequencing in 30 consanguineous Iranian families with a child affected by SRNS with a mean age of onset of 16 months. We identified disease-causing variants and one variant of uncertain significance in 22 families (73%), including variants in NPHS1 (30%), followed by NPHS2 (20%), WT1 (7%) as well as in NUP205, COQ6, ARHGDIA, SGPL1, and NPHP1 in single cases. Eight of these variants have not previously been reported as disease-causing, including four NPHS1 variants and one variant in NPHS2, ARHGDIA, SGPL1, and NPHP1 each. CONCLUSION: In line with previous studies in non-Iranian subjects, we most frequently identified disease-causing variants in NPHS1 and NPHS2. While Sanger sequencing of NPHS2 can be considered as first diagnostic step in non-congenital cases, the genetic heterogeneity underlying SRNS renders next-generation sequencing based diagnostics as the most efficient genetic screening method. In accordance with the mainly autosomal recessive inheritance pattern, diagnostic yield can be significantly higher in consanguineous than in outbred populations.

Available:
pdf (965 KB)   doi:10.3389/fped.2022.974840   pmid:36245711   BibTeX Entry ( Najafi_Riedhammer_Rad-High_detec_rate-2022 )

Spacer prioritization in CRISPR-Cas9 immunity is enabled by the leader RNA

Chunyu Liao, Sahil Sharma, Sarah L. Svensson, Anuja Kibe, Zasha Weinberg, Omer S. Alkhnbashi, Thorsten Bischler, Rolf Backofen, Neva Caliskan, Cynthia M. Sharma, Chase L. Beisel

In: Nat Microbiol, 2022, 7(4), 530-541

CRISPR-Cas systems store fragments of foreign DNA, called spacers, as immunological recordings used to combat future infections. Of the many spacers stored in a CRISPR array, the most recent are known to be prioritized for immune defence. However, the underlying mechanism remains unclear. Here we show that the leader region upstream of CRISPR arrays in CRISPR-Cas9 systems enhances CRISPR RNA (crRNA) processing from the newest spacer, prioritizing defence against the matching invader. Using the CRISPR-Cas9 system from Streptococcus pyogenes as a model, we found that the transcribed leader interacts with the conserved repeats bordering the newest spacer. The resulting interaction promotes transactivating crRNA (tracrRNA) hybridization with the second of the two repeats, accelerating crRNA processing. Accordingly, disruption of this structure reduces the abundance of the associated crRNA and immune defence against targeted plasmids and bacteriophages. Beyond the S. pyogenes system, bioinformatics analyses revealed that leader-repeat structures appear across CRISPR-Cas9 systems. CRISPR-Cas systems thus possess an RNA-based mechanism to prioritize defence against the most recently encountered invaders.

Available:
pdf (11965 KB)   doi:10.1038/s41564-022-01074-3   pmid:35314780   BibTeX Entry ( Liao_Sharma_Svensson-Space_prior_CRISP-2022 )

Spectrum of Genetic Variants in a Cohort of 37 Laterality Defect Cases

Dinu Antony, Elif Gulec Yilmaz, Alper Gezdirici, Lennart Slagter, Zeineb Bakey, Helen Bornaun, Ibrahim Cansaran Tanidir, Tran Van Dinh, Han G. Brunner, Peter Walentek, Sebastian J. Arnold, Rolf Backofen, Miriam Schmidts

In: Front Genet, 2022, 13, 861236

Laterality defects are defined by the perturbed left-right arrangement of organs in the body, occurring in a syndromal or isolated fashion. In humans, primary ciliary dyskinesia (PCD) is a frequent underlying condition of defective left-right patterning, where ciliary motility defects also result in reduced airway clearance, frequent respiratory infections, and infertility. Non-motile cilia dysfunction and dysfunction of non-ciliary genes can also result in disturbances of the left-right body axis. Despite long-lasting genetic research, identification of gene mutations responsible for left-right patterning has remained surprisingly low. Here, we used whole-exome sequencing with Copy Number Variation (CNV) analysis to delineate the underlying molecular cause in 35 mainly consanguineous families with laterality defects. We identified causative gene variants in 14 families with a majority of mutations detected in genes previously associated with PCD, including two small homozygous CNVs. None of the patients were previously clinically diagnosed with PCD, underlining the importance of genetic diagnostics for PCD diagnosis and adequate clinical management. Identified variants in non-PCD-associated genes included variants in PKD1L1 and PIFO, suggesting that dysfunction of these genes results in laterality defects in humans. Furthermore, we detected candidate variants in GJA1 and ACVR2B possibly associated with situs inversus. The low mutation detection rate of this study, in line with other previously published studies, points toward the possibility of non-coding genetic variants, putative genetic mosaicism, epigenetic, or environmental effects promoting laterality defects.

Available:
pdf (1822 KB)   doi:10.3389/fgene.2022.861236   pmid:35547246   BibTeX Entry ( Antony_Gulec_Yilmaz_Gezdirici-Spect_Genet_Varia-2022 )

CRISPRtracrRNA: robust approach for CRISPR tracrRNA detection

Alexander Mitrofanov, Marcus Ziemann, Omer S. Alkhnbashi, Wolfgang R. Hess, Rolf Backofen

In: Bioinformatics, 2022, 38(Suppl_2), ii42-ii48

MOTIVATION: The CRISPR-Cas9 system is a Type II CRISPR system that has rapidly become the most versatile and widespread tool for genome engineering. It consists of two components, the Cas9 effector protein, and a single guide RNA that combines the spacer (for identifying the target) with the tracrRNA, a trans-activating small RNA required for both crRNA maturation and interference. While there are well-established methods for screening Cas effector proteins and CRISPR arrays, the detection of tracrRNA remains the bottleneck in detecting Class 2 CRISPR systems. RESULTS: We introduce a new pipeline CRISPRtracrRNA for screening and evaluation of tracrRNA candidates in genomes. This pipeline combines evidence from different components of the Cas9-sgRNA complex. The core is a newly developed structural model via covariance models from a sequence-structure alignment of experimentally validated tracrRNAs. As additional evidence, we determine the terminator signal (required for the tracrRNA transcription) and the RNA-RNA interaction between the CRISPR array repeat and the 5'-part of the tracrRNA. Repeats are detected via an ML-based approach (CRISPRidenify). Providing further evidence, we detect the cassette containing the Cas9 (Type II CRISPR systems) and Cas12 (Type V CRISPR systems) effector protein. Our tool is the first for detecting tracrRNA for Type V systems. AVAILABILITY AND IMPLEMENTATION: The implementation of the CRISPRtracrRNA is available on GitHub upon requesting the access permission, (https://github.com/BackofenLab/CRISPRtracrRNA). Data generated in this study can be obtained upon request to the corresponding person: Rolf Backofen (backofen@informatik.uni-freiburg.de). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
pdf (462 KB)   doi:10.1093/bioinformatics/btac466   pmid:36124799   BibTeX Entry ( Mitrofanov_Ziemann_Alkhnbashi-CRISP_robus_appro-2022 )

RiboReport - benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria

Rick Gelhausen, Teresa Müller, Sarah L Svensson, Omer S Alkhnbashi, Cynthia M Sharma, Florian Eggenhofer, Rolf Backofen

In: Briefings in Bioinformatics, 01 2022

Small proteins encoded by short open reading frames (ORFs) with 50 codons or fewer are emerging as an important class of cellular macromolecules in diverse organisms. However, they often evade detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation in genomic regions previously thought to be non-coding, driving the development of ORF detection tools using Ribo-seq data. However, only a handful of tools have been designed for bacteria, and these have not yet been systematically compared. Here, we aimed to identify tools that use Ribo-seq data to correctly determine the translational status of annotated bacterial ORFs and also discover novel translated regions with high sensitivity. To this end, we generated a large set of annotated ORFs from four diverse bacterial organisms, manually labeled for their translation status based on Ribo-seq data, which are available for future benchmarking studies. This set was used to investigate the predictive performance of seven Ribo-seq-based ORF detection tools (REPARATION\_blast, DeepRibo, Ribo-TISH, PRICE, smORFer, ribotricer and SPECtre), as well as IRSOM, which uses coding potential and RNA-seq coverage only. DeepRibo and REPARATION\_blast robustly predicted translated ORFs, including sORFs, with no significant difference for ORFs in close proximity to other genes versus stand-alone genes. However, no tool predicted a set of novel, experimentally verified sORFs with high sensitivity. Start codon predictions with smORFer show the value of initiation site profiling data to further improve the sensitivity of ORF prediction tools in bacteria. Overall, we find that bacterial tools perform well for sORF detection, although there is potential for improving their performance, applicability, usability and reproducibility.

Publication note:
bbab549

Available:
pdf (48702 KB)   doi:10.1093/bib/bbab549   BibTeX Entry ( riboreport2022 )

Pluripotency factors determine gene expression repertoire at zygotic genome activation

Meijiang Gao, Marina Veil, Marcus Rosenblatt, Aileen Julia Riesle, Anna Gebhard, Helge Hass, Lenka Buryanova, Lev Y. Yampolsky, Bjorn Gruning, Sergey V. Ulianov, Jens Timmer, Daria Onichtchouk

In: Nat Commun, 2022, 13(1), 788

Awakening of zygotic transcription in animal embryos relies on maternal pioneer transcription factors. The interplay of global and specific functions of these proteins remains poorly understood. Here, we analyze chromatin accessibility and time-resolved transcription in single and double mutant zebrafish embryos lacking pluripotency factors Pou5f3 and Sox19b. We show that two factors modify chromatin in a largely independent manner. We distinguish four types of direct enhancers by differential requirements for Pou5f3 or Sox19b. We demonstrate that changes in chromatin accessibility of enhancers underlie the changes in zygotic expression repertoire in the double mutants. Pou5f3 or Sox19b promote chromatin accessibility of enhancers linked to the genes involved in gastrulation and ventral fate specification. The genes regulating mesendodermal and dorsal fates are primed for activation independently of Pou5f3 and Sox19b. Strikingly, simultaneous loss of Pou5f3 and Sox19b leads to premature expression of genes, involved in regulation of organogenesis and differentiation.

Available:
pdf (10295 KB)   doi:10.1038/s41467-022-28434-1   pmid:35145080   BibTeX Entry ( Gao_Veil_Rosenblatt-Pluri_facto_deter-2022 )

Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework

Matthias Fahrner, Melanie Christine Foll, Bjorn Andreas Gruning, Matthias Bernt, Hannes Rost, Oliver Schilling

In: Gigascience, 2022, 11

BACKGROUND: Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility. FINDINGS: To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community. CONCLUSION: The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.

Available:
pdf (3569 KB)   doi:10.1093/gigascience/giac005   pmid:35166338   BibTeX Entry ( Fahrner_Foll_Gruning-Democ_data_acqui-2022 )

Gut microbiota drives age-related oxidative stress and mitochondrial damage in microglia via the metabolite N(6)-carboxymethyllysine

Omar Mossad, Berenice Batut, Bahtiyar Yilmaz, Nikolaos Dokalis, Charlotte Mezo, Elisa Nent, Lara Susann Nabavi, Melanie Mayer, Feres Jose Mocayar Maron, Joerg M. Buescher, Mercedes Gomez de Aguero, Antal Szalay, Tim Lammermann, Andrew J. Macpherson, Stephanie C. Ganal-Vonarburg, Rolf Backofen, Daniel Erny, Marco Prinz, Thomas Blank

In: Nat Neurosci, 2022, 25(3), 295-305

Microglial function declines during aging. The interaction of microglia with the gut microbiota has been well characterized during development and adulthood but not in aging. Here, we compared microglial transcriptomes from young-adult and aged mice housed under germ-free and specific pathogen-free conditions and found that the microbiota influenced aging associated-changes in microglial gene expression. The absence of gut microbiota diminished oxidative stress and ameliorated mitochondrial dysfunction in microglia from the brains of aged mice. Unbiased metabolomic analyses of serum and brain tissue revealed the accumulation of N(6)-carboxymethyllysine (CML) in the microglia of the aging brain. CML mediated a burst of reactive oxygen species and impeded mitochondrial activity and ATP reservoirs in microglia. We validated the age-dependent rise in CML levels in the sera and brains of humans. Finally, a microbiota-dependent increase in intestinal permeability in aged mice mediated the elevated levels of CML. This study adds insight into how specific features of microglia from aged mice are regulated by the gut microbiota.

Available:
pdf (6854 KB)   doi:10.1038/s41593-022-01027-3   pmid:35241804   BibTeX Entry ( Mossad_Batut_Yilmaz-Gut_micro_drive-2022 )

Ten simple rules for making a software tool workflow-ready

Paul Brack, Peter Crowther, Stian Soiland-Reyes, Stuart Owen, Douglas Lowe, Alan R. Williams, Quentin Groom, Mathias Dillen, Frederik Coppens, Bjorn Gruning, Ignacio Eguinoa, Philip Ewels, Carole Goble

In: PLoS Comput Biol, 2022, 18(3), e1009823

No Abstract available

Available:
pdf (521 KB)   doi:10.1371/journal.pcbi.1009823   pmid:35324885   BibTeX Entry ( Brack_Crowther_Soiland-Reyes-Ten_simpl_rules-2022 )

Galaxy workflows for fragment-based virtual screening: a case study on the SARS-CoV-2 main protease

Simon Bray, Tim Dudgeon, Rachael Skyner, Rolf Backofen, Bjorn Gruning, Frank von Delft

In: J Cheminform, 2022, 14(1), 22

We present several workflows for protein-ligand docking and free energy calculation for use in the workflow management system Galaxy. The workflows are composed of several widely used open-source tools, including rDock and GROMACS, and can be executed on public infrastructure using either Galaxy's graphical interface or the command line. We demonstrate the utility of the workflows by running a high-throughput virtual screening of around 50000 compounds against the SARS-CoV-2 main protease, a system which has been the subject of intense study in the last year.

Available:
pdf (1958 KB)   doi:10.1186/s13321-022-00588-6   pmid:35414112   BibTeX Entry ( Bray_Dudgeon_Skyner-Galax_workf_for-2022 )

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update

In: Nucleic Acids Res, 2022, 50(W1), W345-W351

Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.

Available:
pdf (1760 KB)   doi:10.1093/nar/gkac247   pmid:35446428   BibTeX Entry ( The_Galax_platf-NAR2022 )

MaxQuant and MSstats in Galaxy Enable Reproducible Cloud-Based Analysis of Quantitative Proteomics Experiments for Everyone

Niko Pinter, Damian Glatzer, Matthias Fahrner, Klemens Frohlich, James Johnson, Bjorn Andreas Gruning, Bettina Warscheid, Friedel Drepper, Oliver Schilling, Melanie Christine Foll

In: J Proteome Res, 2022, 21(6), 1558-1565

Quantitative mass spectrometry-based proteomics has become a high-throughput technology for the identification and quantification of thousands of proteins in complex biological samples. Two frequently used tools, MaxQuant and MSstats, allow for the analysis of raw data and finding proteins with differential abundance between conditions of interest. To enable accessible and reproducible quantitative proteomics analyses in a cloud environment, we have integrated MaxQuant (including TMTpro 16/18plex), Proteomics Quality Control (PTXQC), MSstats, and MSstatsTMT into the open-source Galaxy framework. This enables the web-based analysis of label-free and isobaric labeling proteomics experiments via Galaxy's graphical user interface on public clouds. MaxQuant and MSstats in Galaxy can be applied in conjunction with thousands of existing Galaxy tools and integrated into standardized, sharable workflows. Galaxy tracks all metadata and intermediate results in analysis histories, which can be shared privately for collaborations or publicly, allowing full reproducibility and transparency of published analysis. To further increase accessibility, we provide detailed hands-on training materials. The integration of MaxQuant and MSstats into the Galaxy framework enables their usage in a reproducible way on accessible large computational infrastructures, hence realizing the foundation for high-throughput proteomics data science for everyone.

Available:
pdf (3851 KB)   doi:10.1021/acs.jproteome.2c00051   pmid:35503992   BibTeX Entry ( Pinter_Glatzer_Fahrner-MaxQu_and_MSsta-2022 )

Expanding the Galaxy's reference data

Nagampalli VijayKrishna, Jayadev Joshi, Nate Coraor, Jennifer Hillman-Jackson, Dave Bouvier, Marius van den Beek, Ignacio Eguinoa, Frederik Coppens, John Davis, Michal Stolarczyk, Nathan C. Sheffield, Simon Gladman, Gianmauro Cuccuru, Bjorn Gruning, Nicola Soranzo, Helena Rasche, Bradley W. Langhorst, Matthias Bernt, Dan Fornika, David Anderson de Lima Morais, Michel Barrette, Peter van Heusden, Mauro Petrillo, Antonio Puertas-Gallardo, Alex Patak, Hans-Rudolf Hotz, Daniel Blankenberg

In: Bioinform Adv, 2022, 2(1), vbac030

SUMMARY: Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. AVAILABILITY AND IMPLEMENTATION: The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_ma% nagers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.

Available:
pdf (716 KB)   doi:10.1093/bioadv/vbac030   pmid:35669346   BibTeX Entry ( VijayKrishna_Joshi_Coraor-Expan_the_Galax-2022 )

Loop detection using Hi-C data with HiCExplorer

Joachim Wolff, Rolf Backofen, Bjorn Gruning

In: Gigascience, 2022, 11

BACKGROUND: Chromatin loops are an essential factor in the structural organization of the genome; however, their detection in Hi-C interaction matrices is a challenging and compute-intensive task. The approach presented here, integrated into the HiCExplorer software, shows a chromatin loop detection algorithm that applies a strict candidate selection based on continuous negative binomial distributions and performs a Wilcoxon rank-sum test to detect enriched Hi-C interactions. RESULTS: HiCExplorer's loop detection has a high detection rate and accuracy. It is the fastest available CPU implementation and utilizes all threads offered by modern multicore platforms. CONCLUSIONS: HiCExplorer's method to detect loops by using a continuous negative binomial function combined with the donut approach from HiCCUPS leads to reliable and fast computation of loops. All the loop-calling algorithms investigated provide differing results, which intersect by \sim 50\% at most. The tested in situ Hi-C data contain a large amount of noise; achieving better agreement between loop calling algorithms will require cleaner Hi-C data and therefore future improvements to the experimental methods that generate the data.

Available:
pdf (4951 KB)   doi:10.1093/gigascience/giac061   pmid:35809047   BibTeX Entry ( Wolff_Backofen_Gruning-Loop_detec_using-2022 )

The antileukemic activity of decitabine upon PML/RARA-negative AML blasts is supported by all-trans retinoic acid: in vitro and in vivo evidence for cooperation

Ruth Meier, Gabriele Greve, Dennis Zimmer, Helena Bresser, Bettina Berberich, Ralitsa Langova, Julia Stomper, Anne Rubarth, Lars Feuerbach, Daniel B. Lipka, Joschka Hey, Bjorn Gruning, Benedikt Brors, Justus Duyster, Christoph Plass, Heiko Becker, Michael Lubbert

In: Blood Cancer J, 2022, 12(8), 122

The prognosis of AML patients with adverse genetics, such as a complex, monosomal karyotype and TP53 lesions, is still dismal even with standard chemotherapy. DNA-hypomethylating agent monotherapy induces an encouraging response rate in these patients. When combined with decitabine (DAC), all-trans retinoic acid (ATRA) resulted in an improved response rate and longer overall survival in a randomized phase II trial (DECIDER; NCT00867672). The molecular mechanisms governing this in vivo synergism are unclear. We now demonstrate cooperative antileukemic effects of DAC and ATRA on AML cell lines U937 and MOLM-13. By RNA-sequencing, derepression of >1200 commonly regulated transcripts following the dual treatment was observed. Overall chromatin accessibility (interrogated by ATAC-seq) and, in particular, at motifs of retinoic acid response elements were affected by both single-agent DAC and ATRA, and enhanced by the dual treatment. Cooperativity regarding transcriptional induction and chromatin remodeling was demonstrated by interrogating the HIC1, CYP26A1, GBP4, and LYZ genes, in vivo gene derepression by expression studies on peripheral blood blasts from AML patients receiving DAC + ATRA. The two drugs also cooperated in derepression of transposable elements, more effectively in U937 (mutated TP53) than MOLM-13 (intact TP53), resulting in a "viral mimicry" response. In conclusion, we demonstrate that in vitro and in vivo, the antileukemic and gene-derepressive epigenetic activity of DAC is enhanced by ATRA.

Available:
pdf (5499 KB)   doi:10.1038/s41408-022-00715-4   pmid:35995769   BibTeX Entry ( Meier_Greve_Zimmer-The_antil_activ-2022 )

Catching the Wave: Detecting Strain-Specific SARS-CoV-2 Peptides in Clinical Samples Collected during Infection Waves from Diverse Geographical Locations

Subina Mehta, Valdemir M. Carvalho, Andrew T. Rajczewski, Olivier Pible, Bjorn A. Gruning, James E. Johnson, Reid Wagner, Jean Armengaud, Timothy J. Griffin, Pratik D. Jagtap

In: Viruses, 2022, 14(10)

The Coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) resulted in a major health crisis worldwide with its continuously emerging new strains, resulting in new viral variants that drive "wavesöf infection. PCR or antigen detection assays have been routinely used to detect clinical infections; however, the emergence of these newer strains has presented challenges in detection. One of the alternatives has been to detect and characterize variant-specific peptide sequences from viral proteins using mass spectrometry (MS)-based methods. MS methods can potentially help in both diagnostics and vaccine development by understanding the dynamic changes in the viral proteome associated with specific strains and infection waves. In this study, we developed an accessible, flexible, and shareable bioinformatics workflow that was implemented in the Galaxy Platform to detect variant-specific peptide sequences from MS data derived from the clinical samples. We demonstrated the utility of the workflow by characterizing published clinical data from across the world during various pandemic waves. Our analysis identified six SARS-CoV-2 variant-specific peptides suitable for confident detection by MS in commonly collected clinical samples.

Available:
pdf (1409 KB)   doi:10.3390/v14102205   pmid:36298760   BibTeX Entry ( Mehta_Carvalho_Rajczewski-Catch_the_Wave-2022 )

Genome wide CRISPR screen for Pasteurella multocida toxin (PMT) binding proteins reveals LDL Receptor Related Protein 1 (LRP1) as crucial cellular receptor

Julian Schoellkopf, Thomas Mueller, Lena Hippchen, Teresa Mueller, Raphael Reuten, Rolf Backofen, Joachim Orth, Gudula Schmidt

In: PLoS Pathog, 2022, 18(12), e1010781

PMT is a protein toxin produced by Pasteurella multocida serotypes A and D. As causative agent of atrophic rhinitis in swine, it leads to rapid degradation of the nasal turbinate bone. The toxin acts as a deamidase to modify a crucial glutamine in heterotrimeric G proteins, which results in constitutive activation of the G proteins and permanent stimulation of numerous downstream signaling pathways. Using a lentiviral based genome wide CRISPR knockout screen in combination with a lethal toxin chimera, consisting of full length inactive PMT and the catalytic domain of diphtheria toxin, we identified the LRP1 gene encoding the Low-Density Lipoprotein Receptor-related protein 1 as a critical host factor for PMT function. Loss of LRP1 reduced PMT binding and abolished the cellular response and deamidation of heterotrimeric G proteins, confirming LRP1 to be crucial for PMT uptake. Expression of LRP1 or cluster 4 of LRP1 restored intoxication of the knockout cells. In summary our data demonstrate LRP1 as crucial host entry factor for PMT intoxication by acting as its primary cell surface receptor.

Available:
pdf (1528 KB)   doi:10.1371/journal.ppat.1010781   pmid:36516199   BibTeX Entry ( Schoellkopf_Mueller_Hippchen-Genom_wide_CRISP-2022 )

Distinct SARS-CoV-2 RNA fragments activate Toll-like receptors 7 and 8 and induce cytokine release from human macrophages and microglia

Thomas Wallach, Martin Raden, Lukas Hinkelmann, Mariam Brehm, Dominik Rabsch, Hannah Weidling, Christina Kruger, Helmut Kettenmann, Rolf Backofen, Seija Lehnardt

In: Front Immunol, 2022, 13, 1066456

INTRODUCTION: The pandemic coronavirus disease 19 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and is marked by thromboembolic events and an inflammatory response throughout the body, including the brain. METHODS: Employing the machine learning approach BrainDead we systematically screened for SARS-CoV-2 genome-derived single-stranded (ss) RNA fragments with high potential to activate the viral RNA-sensing innate immune receptors Toll-like receptor (TLR)7 and/or TLR8. Analyzing HEK TLR7/8 reporter cells we tested such RNA fragments with respect to their potential to induce activation of human TLR7 and TLR8 and to activate human macrophages, as well as iPSC-derived human microglia, the resident immune cells in the brain. RESULTS: We experimentally validated several sequence-specific RNA fragment candidates out of the SARS-CoV-2 RNA fragments predicted in silico as activators of human TLR7 and TLR8. Moreover, these SARS-CoV-2 ssRNAs induced cytokine release from human macrophages and iPSC-derived human microglia in a sequence- and species-specific fashion. DISCUSSION: Our findings determine TLR7 and TLR8 as key sensors of SARS-CoV-2-derived ssRNAs and may deepen our understanding of the mechanisms how this virus triggers, but also modulates an inflammatory response through innate immune signaling.

Available:
pdf (1591 KB)   doi:10.3389/fimmu.2022.1066456   pmid:36713399   BibTeX Entry ( Wallach_Raden_Hinkelmann-Disti_SARS_RNA-2022 )

An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy

Anup Kumar, Gianmauro Cuccuru, Bjorn Gruning, Rolf Backofen

In: Gigascience, 2022, 12

BACKGROUND: Artificial intelligence (AI) programs that train on large datasets require powerful compute infrastructure consisting of several CPU cores and GPUs. JupyterLab provides an excellent framework for developing AI programs, but it needs to be hosted on such an infrastructure to enable faster training of AI programs using parallel computing. FINDINGS: An open-source, docker-based, and GPU-enabled JupyterLab infrastructure is developed that runs on the public compute infrastructure of Galaxy Europe consisting of thousands of CPU cores, many GPUs, and several petabytes of storage to rapidly prototype and develop end-to-end AI projects. Using a JupyterLab notebook, long-running AI model training programs can also be executed remotely to create trained models, represented in open neural network exchange (ONNX) format, and other output datasets in Galaxy. Other features include Git integration for version control, the option of creating and executing pipelines of notebooks, and multiple dashboards and packages for monitoring compute resources and visualization, respectively. CONCLUSIONS: These features make JupyterLab in Galaxy Europe highly suitable for creating and managing AI projects. A recent scientific publication that predicts infected regions in COVID-19 computed tomography scan images is reproduced using various features of JupyterLab on Galaxy Europe. In addition, ColabFold, a faster implementation of AlphaFold2, is accessed in JupyterLab to predict the 3-dimensional structure of protein sequences. JupyterLab is accessible in 2 ways-one as an interactive Galaxy tool and the other by running the underlying Docker container. In both ways, long-running training can be executed on Galaxy's compute infrastructure. Scripts to create the Docker container are available under MIT license at https://github.com/usegalaxy-eu/gpu-jupyterlab-docker.

Available:
pdf (1010 KB)   doi:10.1093/gigascience/giad028   pmid:37099385   BibTeX Entry ( Kumar_Cuccuru_Gruning-acces_infra_for-2022 )

Temporal annotation of high-resolution intra-annual wood density information of Eucalyptus urophylla and its correlation with hydroclimatic conditions

Gleice Gomes Rodrigues, Martin Raden, Luciana Duque Silva, Hans-Peter Kahle

In: Dendrochronologia, 2022, 74, 125978

Three different Eucalyptus urophylla clones grown under two different spacing regimes in an experimental site in the state of São Paulo, Brazil, were analyzed to test effects of clone identity, spacing, cambial age and hydroclimatic conditions on high-resolution intra-annual wood density profiles. Since distinct periodic tree-ring boundaries were not visible on the stem cross-sectional surfaces, finding an alternative method for synchronization of density profiles was crucial for the analysis. The challenge was to generate intra- and inter-tree synchronized density profiles that possess high amplitude variation and low phase variation. Thus, we developed a protocol and workflow of how such high-resolution density profiles can be spatially aligned and temporally annotated to enable correlation analyses between trees and with time series of environmental stimuli. Mean wood density was significantly different between clones, but not between the spacings. Wood density increased significantly with increasing cambial age and decreasing growth rate. Principal component analysis showed that the overall variability in the temporally annotated density profiles is dominated by a highly significant common signal. We found significant negative correlation values for precipitation, indicating that water supply is the main driver of stem growth at the site, and providing evidence for the correctness of the method. The developed workflow can easily be adjusted to the analysis of other intra-annual tree-ring features like anatomical xylem cell traits or isotopic signals in the wood. It has a large potential to be used as a general guideline for the synchronization of intra-annual tree-ring traits, especially when distinct tree-ring boundaries are missing, as it is often the case under tropical climatic conditions. The workflow supports the development of spatially aligned and temporally annotated chronologies under non-annual growth rhythms.

Available:
pdf (1498 KB)   doi:https://doi.org/10.1016/j.dendro.2022.125978   BibTeX Entry ( Rodrigues2022 )

How to do RNA-RNA interaction prediction? A use-case driven handbook using IntaRNA

Martin Raden, Milad Miladi

In: Methods in Molecular Biology, 2022

Computational prediction of RNA-RNA interactions (RRI) is a central methodology for the specific investigation of inter-molecular RNA interactions and regulatory effects of non-coding RNAs like eukaryotic microRNAs or prokaryotic small RNAs. Available methods can be classified according to their underlying prediction strategies, each implicating specific capabilities and restrictions often not transparent to the non-expert user. Within this work, we review seven classes of RRI prediction strategies and discuss advantages and limitations of respective tools, since such knowledge is essential for selecting the right tool in the first place. Currently, accessibility-based approaches provide the most reliable RRI predictions. Thus, we discuss how IntaRNA, one of the state-of-the-art accessibility-based tools, can be applied in various use cases of computational RRI prediction. Detailed hands-on examples both for specific RRI prediction as well as large-scale target prediction are provided to illustrate the flexibility and capabilities of IntaRNA. Each example uses realistic data from the literature and is accompanied by instructions how to interpret respective results to enable non-expert users to comprehensively understand and utilize IntaRNAs features for RRI predictions.

Publication note:
(in production)

Available:
BibTeX Entry ( Raden-IntaRNA-handson-2021 )

A distinct CD38+CD45RA+ population of CD4+, CD8+, and double-negative T cells is controlled by FAS

Maria Elena Maccari, Sebastian Fuchs, Patrick Kury, Geoffroy Andrieux, Simon Volkl, Bertram Bengsch, Myriam Ricarda Lorenz, Maximilian Heeg, Jan Rohr, Sabine Jagle, Carla N. Castro, Miriam Gross, Ursula Warthorst, Christoph Konig, Ilka Fuchs, Carsten Speckmann, Julian Thalhammer, Friedrich G. Kapp, Markus G. Seidel, Gregor Duckers, Stefan Schonberger, Catharina Schutz, Marita Fuhrer, Robin Kobbe, Dirk Holzinger, Christian Klemann, Petr Smisek, Stephen Owens, Gerd Horneff, Reinhard Kolb, Nora Naumann-Bartsch, Maurizio Miano, Julian Staniek, Marta Rizzi, Tomas Kalina, Pascal Schneider, Anika Erxleben, Rolf Backofen, Arif Ekici, Charlotte M. Niemeyer, Klaus Warnatz, Bodo Grimbacher, Hermann Eibel, Andreas Mackensen, Andreas Philipp Frei, Klaus Schwarz, Melanie Boerries, Stephan Ehl, Anne Rensing-Ehl

In: J Exp Med, 2021, 218(2)

The identification and characterization of rare immune cell populations in humans can be facilitated by their growth advantage in the context of specific genetic diseases. Here, we use autoimmune lymphoproliferative syndrome to identify a population of FAS-controlled TCRalphabeta+ T cells. They include CD4+, CD8+, and double-negative T cells and can be defined by a CD38+CD45RA+T-BET- expression pattern. These unconventional T cells are present in healthy individuals, are generated before birth, are enriched in lymphoid tissue, and do not expand during acute viral infection. They are characterized by a unique molecular signature that is unambiguously different from other known T cell differentiation subsets and independent of CD4 or CD8 expression. Functionally, FAS-controlled T cells represent highly proliferative, noncytotoxic T cells with an IL-10 cytokine bias. Mechanistically, regulation of this physiological population is mediated by FAS and CTLA4 signaling, and its survival is enhanced by mTOR and STAT3 signals. Genetic alterations in these pathways result in expansion of FAS-controlled T cells, which can cause significant lymphoproliferative disease.

Available:
pdf (8196 KB)   doi:10.1084/jem.20192191   pmid:33170215   BibTeX Entry ( Maccari_Fuchs_Kury-disti_popul_and-2021 )

Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs

Joachim Wolff, Rolf Backofen, Bjorn Gruning

In: Bioinformatics, 2021, 37(22), 4006-4013

MOTIVATION: Hi-C technology provides insights into the 3D organization of the chromatin, and the single-cell Hi-C method enables researchers to gain knowledge about the chromatin state in individual cell levels. Single-cell Hi-C interaction matrices are high dimensional and very sparse. To cluster thousands of single-cell Hi-C interaction matrices, they are flattened and compiled into one matrix. Depending on the resolution, this matrix can have a few million or even billions of features; therefore, computations can be memory intensive. We present a single-cell Hi-C clustering approach using an approximate nearest neighbors method based on locality-sensitive hashing to reduce the dimensions and the computational resources. RESULTS: The presented method can process a 10 kb single-cell Hi-C dataset with 2600 cells and needs 40 GB of memory, while competitive approaches are not computable even with 1 TB of memory. It can be shown that the differentiation of the cells by their chromatin folding properties and, therefore, the quality of the clustering of single-cell Hi-C data is advantageous compared to competitive algorithms. AVAILABILITY AND IMPLEMENTATION: The presented clustering algorithm is part of the scHiCExplorer, is available on Github https://github.com/joachimwolff/scHiCExplorer, and as a conda package via the bioconda channel. The approximate nearest neighbors implementation is available via https://github.com/joachimwolff/sparse-neighbors-search and as a conda package via the bioconda channel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
pdf (1434 KB)   doi:10.1093/bioinformatics/btab394   pmid:34021764   BibTeX Entry ( Wolff_Backofen_Gruning-Robus_and_effic-2021 )

StoatyDive: Evaluation and classification of peak profiles for sequencing data

Florian Heyl, Rolf Backofen

In: Gigascience, 2021, 10(6)

BACKGROUND: The prediction of binding sites (peak-calling) is a common task in the data analysis of methods such as cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). The predicted binding sites are often further analyzed to predict sequence motifs or structure patterns. When looking at a typical result of such high-throughput experiments, the obtained peak profiles differ largely on a genomic level. Thus, a tool is missing that evaluates and classifies the predicted peaks on the basis of their shapes. We hereby present StoatyDive, a tool that can be used to filter for specific peak profile shapes of sequencing data such as CLIP. FINDINGS: With StoatyDive we are able to classify peak profile shapes from CLIP-seq data of the histone stem-loop-binding protein (SLBP). We compare the results to existing tools and show that StoatyDive finds more distinct peak shape clusters for CLIP data. Furthermore, we present StoatyDive's capabilities as a quality control tool and as a filter to pick different shapes based on biological or technical questions for other CLIP data from different RNA binding proteins with different biological functions and numbers of RNA recognition motifs. We finally show that proteins involved in splicing, such as RBM22 and U2AF1, have potentially sharper-shaped peaks than other RNA binding proteins. CONCLUSION: StoatyDive finally fills the demand for a peak shape clustering tool for CLIP-Seq data that fine-tunes downstream analysis steps such as structure or sequence motif predictions and that acts as a quality control.

Available:
pdf (2600 KB)   doi:10.1093/gigascience/giab045   pmid:34143874   BibTeX Entry ( Heyl_Backofen-Stoat_Evalu_and-2021 )

Structure-aware machine learning identifies microRNAs operating as Toll-like receptor 7/8 ligands

Martin Raden, Thomas Wallach, Milad Miladi, Yuanyuan Zhai, Christina Kruger, Zoe J. Mossmann, Paul Dembny, Rolf Backofen, Seija Lehnardt

In: RNA Biol, 2021, 18(sup1), 268-277

MicroRNAs (miRNAs) can serve as activation signals for membrane receptors, a recently discovered function that is independent of the miRNAs' conventional role in post-transcriptional gene regulation. Here, we introduce a machine learning approach, BrainDead, to identify oligonucleotides that act as ligands for single-stranded RNA-detecting Toll-like receptors (TLR)7/8, thereby triggering an immune response. BrainDead was trained on activation data obtained from in vitro experiments on murine microglia, incorporating sequence and intra-molecular structure, as well as inter-molecular homo-dimerization potential of candidate RNAs. The method was applied to analyse all known human miRNAs regarding their potential to induce TLR7/8 signalling and microglia activation. We validated the predicted functional activity of subsets of high- and low-scoring miRNAs experimentally, of which a selection has been linked to Alzheimer's disease. High agreement between predictions and experiments confirms the robustness and power of BrainDead. The results provide new insight into the mechanisms of how miRNAs act as TLR ligands. Eventually, BrainDead implements a generic machine learning methodology for learning and predicting the functions of short RNAs in any context.

Available:
pdf (1041 KB)   doi:10.1080/15476286.2021.1940697   pmid:34241565   BibTeX Entry ( Raden_Wallach_Miladi-Struc_machi_learn-2021 )

CdrS Is a Global Transcriptional Regulator Influencing Cell Division in Haloferax volcanii

Yan Liao, Verena Vogel, Sabine Hauber, Jurgen Bartel, Omer S. Alkhnbashi, Sandra Maass, Thandi S. Schwarz, Rolf Backofen, Dorte Becher, Iain G. Duggin, Anita Marchfelder

In: mBio, 2021, 12(4), e0141621

Transcriptional regulators that integrate cellular and environmental signals to control cell division are well known in bacteria and eukaryotes, but their existence is poorly understood in archaea. We identified a conserved gene (cdrS) that encodes a small protein and is highly transcribed in the model archaeon Haloferax volcanii. The cdrS gene could not be deleted, but CRISPR interference (CRISPRi)-mediated repression of the cdrS gene caused slow growth and cell division defects and changed the expression of multiple genes and their products associated with cell division, protein degradation, and metabolism. Consistent with this complex regulatory network, overexpression of cdrS inhibited cell division, whereas overexpression of the operon encoding both CdrS and a tubulin-like cell division protein (FtsZ2) stimulated division. Chromatin immunoprecipitation-DNA sequencing (ChIP-Seq) identified 18 DNA-binding sites of the CdrS protein, including one upstream of the promoter for a cell division gene, ftsZ1, and another upstream of the essential gene dacZ, encoding diadenylate cyclase involved in c-di-AMP signaling, which is implicated in the regulation of cell division. These findings suggest that CdrS is a transcription factor that plays a central role in a regulatory network coordinating metabolism and cell division. IMPORTANCE Cell division is a central mechanism of life and is essential for growth and development. Members of the Bacteria and Eukarya have different mechanisms for cell division, which have been studied in detail. In contrast, cell division in members of the Archaea is still understudied, and its regulation is poorly understood. Interestingly, different cell division machineries appear in members of the Archaea, with the Euryarchaeota using a cell division apparatus based on the tubulin-like cytoskeletal protein FtsZ, as in bacteria. Here, we identify the small protein CdrS as essential for survival and a central regulator of cell division in the euryarchaeon Haloferax volcanii. CdrS also appears to coordinate other cellular pathways, including synthesis of signaling molecules and protein degradation. Our results show that CdrS plays a sophisticated role in cell division, including regulation of numerous associated genes. These findings are expected to initiate investigations into conditional regulation of division in archaea.

Available:
pdf (2996 KB)   doi:10.1128/mBio.01416-21   pmid:34253062   BibTeX Entry ( Liao_Vogel_Hauber-CdrS_Globa_Trans-2021 )

Chemotherapy-induced transposable elements activate MDA5 to enhance haematopoietic regeneration

Thomas Clapes, Aikaterini Polyzou, Pia Prater, Sagar, Antonio Morales-Hernandez, Mariana Galvao Ferrarini, Natalie Kehrer, Stylianos Lefkopoulos, Veronica Bergo, Barbara Hummel, Nadine Obier, Daniel Maticzka, Anne Bridgeman, Josip S. Herman, Ibrahim Ilik, Lheanna Klaeyle, Jan Rehwinkel, Shannon McKinney-Freeman, Rolf Backofen, Asifa Akhtar, Nina Cabezas-Wallscheid, Ritwick Sawarkar, Rita Rebollo, Dominic Grun, Eirini Trompouki

In: Nat Cell Biol, 2021, 23(7), 704-717

Haematopoietic stem cells (HSCs) are normally quiescent, but have evolved mechanisms to respond to stress. Here, we evaluate haematopoietic regeneration induced by chemotherapy. We detect robust chromatin reorganization followed by increased transcription of transposable elements (TEs) during early recovery. TE transcripts bind to and activate the innate immune receptor melanoma differentiation-associated protein 5 (MDA5) that generates an inflammatory response that is necessary for HSCs to exit quiescence. HSCs that lack MDA5 exhibit an impaired inflammatory response after chemotherapy and retain their quiescence, with consequent better long-term repopulation capacity. We show that the overexpression of ERV and LINE superfamily TE copies in wild-type HSCs, but not in Mda5(-/-) HSCs, results in their cycling. By contrast, after knockdown of LINE1 family copies, HSCs retain their quiescence. Our results show that TE transcripts act as ligands that activate MDA5 during haematopoietic regeneration, thereby enabling HSCs to mount an inflammatory response necessary for their exit from quiescence.

Available:
pdf (6254 KB)   doi:10.1038/s41556-021-00707-9   pmid:34253898   BibTeX Entry ( Clapes_Polyzou_Prater-Chemo_trans_eleme-2021 )

The temperature-regulated DEAD-box RNA helicase CrhR interactome: Autoregulation and photosynthesis-related transcripts

Anzhela Migur, Florian Heyl, Janina Fuss, Afshan Srikumar, Bruno Huettel, Claudia Steglich, Jogadhenu S. S. Prakash, Richard Reinhardt, Rolf Backofen, George W. Owttrim, Wolfgang R. Hess

In: J Exp Bot, 2021

RNA helicases play crucial functions in RNA biology. In plants, RNA helicases are encoded by large gene families, performing roles in abiotic stress responses, development, the post-transcriptional regulation of gene expression as well as house-keeping functions. Several of these RNA helicases are targeted to the organelles, mitochondria and chloroplasts. Cyanobacteria are the direct evolutionary ancestors of plant chloroplasts. The cyanobacterium Synechocystis 6803 encodes a single DEAD-box RNA helicase, CrhR, that is induced by a range of abiotic stresses, including low temperature. Though the DeltacrhR mutant exhibits a severe cold-sensitive phenotype, the physiological function(s) performed by CrhR have not been described. To identify transcripts interacting with CrhR, we performed RNA co-immunoprecipitation with extracts from a Synechocystis crhR deletion mutant expressing the FLAG-tagged native CrhR or a K57A mutated version with an anticipated enhanced RNA binding. The composition of the interactome was strikingly biased towards photosynthesis-associated and redox-controlled transcripts. A transcript highly enriched in all experiments was the crhR mRNA, suggesting an auto-regulatory molecular mechanism. The identified interactome explains the described physiological role of CrhR in response to the redox poise of the photosynthetic electron transport chain and characterizes CrhR as an enzyme with a diverse range of transcripts as molecular targets.

Available:
pdf (1825 KB)   doi:10.1093/jxb/erab416   pmid:34499142   BibTeX Entry ( Migur_Heyl_Fuss-The_tempe_DEAD-2021 )

Transcriptome-wide in vivo mapping of cleavage sites for the compact cyanobacterial ribonuclease E reveals insights into its function and substrate recognition

Ute A. Hoffmann, Florian Heyl, Said N. Rogh, Thomas Wallner, Rolf Backofen, Wolfgang R. Hess, Claudia Steglich, Annegret Wilde

In: Nucleic Acids Res, 2021, 49(22), 13075-13091

Ribonucleases are crucial enzymes in RNA metabolism and post-transcriptional regulatory processes in bacteria. Cyanobacteria encode the two essential ribonucleases RNase E and RNase J. Cyanobacterial RNase E is shorter than homologues in other groups of bacteria and lacks both the chloroplast-specific N-terminal extension as well as the C-terminal domain typical for RNase E of enterobacteria. In order to investigate the function of RNase E in the model cyanobacterium Synechocystis sp. PCC 6803, we engineered a temperature-sensitive RNase E mutant by introducing two site-specific mutations, I65F and the spontaneously occurred V94A. This enabled us to perform RNA-seq after the transient inactivation of RNase E by a temperature shift (TIER-seq) and to map 1472 RNase-E-dependent cleavage sites. We inferred a dominating cleavage signature consisting of an adenine at the -3 and a uridine at the +2 position within a single-stranded segment of the RNA. The data identified mRNAs likely regulated jointly by RNase E and an sRNA and potential 3' end-derived sRNAs. Our findings substantiate the pivotal role of RNase E in post-transcriptional regulation and suggest the redundant or concerted action of RNase E and RNase J in cyanobacteria.

Available:
pdf (2730 KB)   doi:10.1093/nar/gkab1161   pmid:34871439   BibTeX Entry ( Hoffmann_Heyl_Rogh-Trans_vivo_mappi-NAR2021 )

ChiRA: an integrated framework for chimeric read analysis from RNA-RNA interactome and RNA structurome data

Pavankumar Videm, Anup Kumar, Oleg Zharkov, Bjorn Andreas Gruning, Rolf Backofen

In: Gigascience, 2021, 10(2)

BACKGROUND: With the advances in next-generation sequencing technologies, it is possible to determine RNA-RNA interaction and RNA structure predictions on a genome-wide level. The reads from these experiments usually are chimeric, with each arm generated from one of the interaction partners. Owing to short read lengths, often these sequenced arms ambiguously map to multiple locations. Thus, inferring the origin of these can be quite complicated. Here we present ChiRA, a generic framework for sensitive annotation of these chimeric reads, which in turn can be used to predict the sequenced hybrids. RESULTS: Grouping reference loci on the basis of aligned common reads and quantification improved the handling of the multi-mapped reads in contrast to common strategies such as the selection of the longest hit or a random choice among all hits. On benchmark data ChiRA improved the number of correct alignments to the reference up to 3-fold. It is shown that the genes that belong to the common read loci share the same protein families or similar pathways. In published data, ChiRA could detect 3 times more new interactions compared to existing approaches. In addition, ChiRAViz can be used to visualize and filter large chimeric datasets intuitively. CONCLUSION: ChiRA tool suite provides a complete analysis and visualization framework along with ready-to-use Galaxy workflows and tutorials for RNA-RNA interactome and structurome datasets. Common read loci built by ChiRA can rescue multi-mapped reads on paralogous genes without requiring any information on gene relations. We showed that ChiRA is sensitive in detecting new RNA-RNA interactions from published RNA-RNA interactome datasets.

Available:
pdf (2245 KB)   doi:10.1093/gigascience/giaa158   pmid:33511995   BibTeX Entry ( Videm_Kumar_Zharkov-ChiRA_integ_frame-2021 )

RNAProt: an efficient and feature-rich RNA binding protein binding site predictor

Michael Uhl, Van Dinh Tran, Florian Heyl, Rolf Backofen

In: Gigascience, 2021, 10(8)

BACKGROUND: Cross-linking and immunoprecipitation followed by next-generation sequencing (CLIP-seq) is the state-of-the-art technique used to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression, which can be highly variable between conditions and thus cannot provide a complete picture of the RBP binding landscape. This creates a demand for computational methods to predict missing binding sites. Although there exist various methods using traditional machine learning and lately also deep learning, we encountered several problems: many of these are not well documented or maintained, making them difficult to install and use, or are not even available. In addition, there can be efficiency issues, as well as little flexibility regarding options or supported features. RESULTS: Here, we present RNAProt, an efficient and feature-rich computational RBP binding site prediction framework based on recurrent neural networks. We compare RNAProt with 1 traditional machine learning approach and 2 deep-learning methods, demonstrating its state-of-the-art predictive performance and better run time efficiency. We further show that its implemented visualizations capture known binding preferences and thus can help to understand what is learned. Since RNAProt supports various additional features (including user-defined features, which no other tool offers), we also present their influence on benchmark set performance. Finally, we show the benefits of incorporating additional features, specifically structure information, when learning the binding sites of an hairpin loop binding RBP. CONCLUSIONS: RNAProt provides a complete framework for RBP binding site predictions, from data set generation over model training to the evaluation of binding preferences and prediction. It offers state-of-the-art predictive performance, as well as superior run time efficiency, while at the same time supporting more features and input types than any other tool available so far. RNAProt is easy to install and use, comes with comprehensive documentation, and is accompanied by informative statistics and visualizations. All this makes RNAProt a valuable tool to apply in future RBP binding site research.

Available:
pdf (1353 KB)   doi:10.1093/gigascience/giab054   pmid:34406415   BibTeX Entry ( Uhl_Tran_Heyl-RNAPr_effic_and-2021 )

A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive

Miguel Roncoroni, Bert Droesbeke, Ignacio Eguinoa, Kim De Ruyck, Flora D'Anna, Dilmurat Yusuf, Björn Grüning, Rolf Backofen, Frederik Coppens

In: Bioinformatics, June 2021

SUMMARY: Many aspects of the global response to the COVID-19 pandemic are enabled by the fast and open publication of SARS-CoV-2 genetic sequence data. The European Nucleotide Archive (ENA) is the European recommended open repository for genetic sequences. In this work, we present a tool for submitting raw sequencing reads of SARS-CoV-2 to ENA. The tool features a single-step submission process, a graphical user interface, tabular-formatted metadata and the possibility to remove human reads prior to submission. A Galaxy wrap of the tool allows users with little or no bioinformatic knowledge to do bulk sequencing read submissions. The tool is also packed in a Docker container to ease deployment. AVAILABILITY: CLI ENA upload tool is available at github.com/usegalaxy-eu/ena-upload-cli (DOI 10.5281/zenodo.4537621); Galaxy ENA upload tool at toolshed.g2.bx.psu.edu/view/iuc/ena\_upload/382518f24d6d and https://github.com/galaxyproject/tools-iuc/tree/master/tools/ena\_upload (development) and; ENA upload Galaxy container at github.com/ELIXIR-Belgium/ena-upload-container (DOI 10.5281/zenodo.4730785).

Available:
doi:10.1093/bioinformatics/btab421   BibTeX Entry ( Gruning_Yusuf-SARS_CoV_2_submission_ENA_Bioinformatics2021 )

CRISPRloci: comprehensive and accurate annotation of CRISPR-Cas system

Omer S. Alkhnbashi, Alexander Mitrofanov, Robson Bonidia, Martin Raden, Van Dinh Tran, Florian Eggenhofer, Shiraz A. Shah, Ekrem \"Ozt\"urk, Victor A. Padilha, Danilo S. Sanches, Andre C.P.L.F. de Carvalho, Rolf Backofen

In: Nucleic Acids Research, 2021

CRISPR-Cas systems are adaptive immune systemsin prokaryotes, providing resistance against invading viruses and plasmids. The identification of CRISPR loci is currently a non-standardized, ambiguous process, requiring the manual combination of multiple tools, where existing tools detect only parts of the CRISPR-systems, and lack quality control, annotation and assessment capabilities of the detected CRISPR loci. OurCRISPRloci server provides the first resource for the prediction and assessment of all possible CRISPR loci. The server integrates a series of advanced Machine Learning tools within a seamless web interface featuring: (i) prediction of all CRISPR arrays in the correct orientation; (ii) definition of CRISPR leaders for each locus; and (iii) annotation of cas genes and their unambiguous classification. As a result, CRISPRloci is able to accurately determine the CRISPR array and associated information, such as: the Cas subtypes; cassette boundaries; accuracy of the repeat structure, orientation and leader sequence; virus-host interactions; self-targeting; as well as the annotation of cas genes, all of which have been missing from existing tools. This annotation is presented in an interactive interface, making it easy for scientists to gain an overview of the CRISPR system in their organism of interest. Predictions are also rendered in GFF format, enabling in-depth genome browser inspection. In summary, CRISPRloci constitutes a full suite for CRISPR-Cas system characterization that offers annotation quality previously available only after manual inspection.

Available:
pdf (1498 KB)   doi:10.1093/nar/gkab456   BibTeX Entry ( Alkhnbashi-CRISPRloci )

An integrated database of small RNAs and their interplay with transcriptional gene regulatory networks in corynebacteria

Mariana T.D. Parise, Doglas Parise, Flavia F. Aburjaile, Anne C.P. Gomide, Rodrigo B. Kato, Martin Raden, Rolf Backofen, Vasco A. de Carvalho Azevedo, Jan Baumbach

In: Frontiers in Microbiology, 2021

Small RNAs (sRNAs) are one of the key players in the post-transcriptional regulation of bacterial gene expression. These molecules, together with transcription factors, form regulatory networks and greatly influence the bacterial regulatory landscape. Little is known concerning sRNAs and their influence on the regulatory machinery in the genus Corynebacterium, despite its medical, veterinary and biotechnological importance. We integrate sRNAs and their regulatory interactions into the transcriptional regulatory networks of six corynebacterial species, covering four human and animal pathogens, and integrate this data into the CoryneRegNet database. To this end, we predicted sRNAs to regulate 754 genes, including 206 transcription factors, in corynebacterial gene regulatory networks. Amongst them, the sRNA Cd-NCTC13129-sRNA-2 is predicted to directly regulate ydfH , which indirectly regulates 66 genes, including the global regulator glxR in C. diphtheriae . All of the sRNA-enriched regulatory networks of the genus Corynebacterium are made publicly available in the newest release of CoryneRegNet ( www.exbio.wzw.tum.de/coryneregnet/ ) to aid in providing valuable insights and to guide future experiments.

Publication note:
(epub ahead of print)

Available:
pdf (9549 KB)   doi:10.3389/fmicb.2021.656435   BibTeX Entry ( Parise-2021 )

MicroRNA-100-5p and microRNA-298-5p released from apoptotic cortical neurons are endogenous Toll-like receptor 7/8 ligands that contribute to neurodegeneration

Thomas Wallach, Zoe Mossmann, Michal Szczepek, Max Wetzel, Rui Machado, Martin Raden, Milad Miladi, Gunnar Kleinau, Christina Kr\"uger, Paul Dembny, Drew Adler, Yuanyuan Zhai, Omar Dzaye, Matthias Futschik, Rolf Backofen, Patrick Scheerer, Seija Lehnardt

In: Molecular Neurodegeneration, 2021, 16(80)

Results: We identified a specific pattern of miRNAs released from apoptotic cortical neurons that activate TLR7 and/or TLR8, depending on sequence and species. Exposure of microglia and macrophages to certain miRNA classes released from apoptotic neurons resulted in the sequence-specific production of distinct cytokines/ chemokines and increased phagocytic activity. Out of those miRNAs miR-100-5p and miR-298-5p, which have consistently been linked to neurodegenerative diseases, entered microglia, located to their endosomes, and directly bound to human TLR8. The miRNA-TLR interaction required novel sequence features, but no specific structure formation of mature miRNA. As a consequence of miR-100-5p- and miR-298-5p-induced TLR activation, cortical neurons underwent cell-autonomous apoptosis. Presence of miR-100-5p and miR-298-5p in cerebrospinal fluid led to neurodegeneration and microglial accumulation in the murine cerebral cortex through TLR7 signaling. Conclusion: Our data demonstrate that specific miRNAs are released from apoptotic cortical neurons, serve as endogenous TLR7/8 ligands, and thereby trigger further neuronal apoptosis in the CNS. Our findings underline the recently discovered role of miRNAs as extracellular signaling molecules, particularly in the context of neurodegeneration. Keywords: Extracellular microRNAs, Endogenous Toll-like receptor ligands, Cortical neurons, Neuronal apoptosis, Microglia, Neurodegeneration, miRNA microarray

Available:
pdf (5116 KB)   doi:10.1186/s13024-021-00498-5   BibTeX Entry ( Wallach-2021 )

Effects of intra-seasonal drought on kinetics of tracheid differentiation and seasonal growth dynamics of Norway spruce along an elevational gradient

Dominik F. Stangler, Hans-Peter Kahle, Martin Raden, Elena Larysch, Thomas Seifert, Heinrich Spiecker

In: Forests, 2021, 12(3), 274

Research Highlights: Our results provide novel perspectives on the effectiveness and collapse of compensatory mechanisms of tracheid development of Norway spruce during intra-seasonal drought and the environmental control of intra-annual density fluctuations. Background and Objectives: This study aimed to compare and integrate complementary methods of investigating intra-annual wood formation dynamics to gain a better understanding of the endogenous and environmental control of tree-ring development and the impact of anticipated climatic changes on forest growth and productivity. Materials and Methods: We performed an integrated analysis of xylogenesis observations, quantitative wood anatomy and point-dendrometer measurements of Norway spruce (Picea abies (L.) Karst.) trees growing along an elevational gradient in south-western Germany during a growing season with an anomalous dry June followed by an extraordinary humid July. Results: Strong endogenous control of tree-ring formation was suggested at the highest elevation where the decreasing rates of tracheid enlargement and wall thickening during drought were effectively compensated by increased cell differentiation duration. A shift to environmental control of tree-ring formation during drought was indicated at the lowest elevation, where we detected absence of compensatory mechanisms, eventually stimulating the formation of an intra-annual density fluctuation. Transient drought stress in June also led to bimodal patterns and decreasing daily rates of stem radial displacement, radial xylem growth and woody biomass production. Comparing xylogenesis data with dendrometer measurements showed ambivalent results and it appears that with decreasing daily rates of radial xylem growth, the signal-to-noise ratio in dendrometer time series between growth and fluctuations of tree water status becomes increasingly detrimental. Conclusions: Our study provides new perspectives into the complex interplay between rates and durations of tracheid development during dry-wet cycles, and thereby contributes to an improved and mechanistic understanding of the environmental control of wood formation processes leading to the formation of intra-annual density fluctuations in tree-rings of Norway spruce.

Available:
supplement.pdf (4644 KB)   pdf (4795 KB)   doi:10.3390/f12030274   BibTeX Entry ( Stangler-2021 )

Structure-aware machine learning classification of oligonucleotide-induced immune response identifies microRNAs operating as Toll-like receptor 7/8 ligands

Martin Raden, Thomas Wallach, Milad Miladi, Yuanyuan Zhai, Christina Kr\"uger, Zoe J. Mossmann, Paul Dembny, Rolf Backofen, Seija Lehnardt

In: RNA Biology, 2021, 18(sup1), 268-277

MicroRNAs (miRNAs) are about 22 nucleotides long and have been linked to various human diseases. They can serve as activation signals for membrane receptors, a recently discovered function that is independent of the miRNAs' conventional role in post-transcriptional gene regulation. Here, we introduce a machine learning approach, BrainDead, to identify oligonucleotides that act as ligands for single-stranded RNA-detecting Toll-like receptors (TLR)7/8, thereby triggering an immune response. BrainDead was trained on activation data obtained from in vitro experiments on murine microglia, the resident immune cells in the brain, incorporating sequence, intra-molecular structure, as well as inter-molecular homo-dimerization potential of candidate RNAs. It was applied to analyze all known human miRNAs regarding their potential to induce TLR7/8 signaling and microglia activation. We validated the predicted functional activity of subsets of high- and low-scoring miRNAs experimentally, of which a selection has been linked to Alzheimer's disease, the most common cause of dementia in humans. High agreement of predictions and experiments confirms the robustness and power of BrainDead. The results provide new insight into the mechanisms how miRNAs act as TLR ligands. Eventually, BrainDead implements a generic machine learning methodology for learning and predicting functions of short RNAs in any context.

Publication note:
(MR, TW, MM contributed equally)

Available:
pdf (984 KB)   doi:10.1080/15476286.2021.1940697   BibTeX Entry ( Raden-2021-BrainDead )

Tool recommender system in Galaxy using deep learning

Anup Kumar, Helena Rasche, Björn A. Grüning, Rolf Backofen

In: GigaScience, 2021, 10(1)

Background: Galaxy is a web-based and open-source scientific data-processing platform. Researchers compose pipelines in Galaxy to analyse scientific data. These pipelines, also known as workflows, can be complex and difficult to create from thousands of tools, especially for researchers new to Galaxy. To help researchers with creating workflows, a system is developed to recommend tools that can facilitate further data analysis. Findings: A model is developed to recommend tools using a deep learning approach by analysing workflows composed by researchers on the European Galaxy server. The higher-order dependencies in workflows, represented as directed acyclic graphs, are learned by training a gated recurrent units neural network, a variant of a recurrent neural network. In the neural network training, the weights of tools used are derived from their usage frequencies over time and the sequences of tools are uniformly sampled from training data. Hyperparameters of the neural network are optimized using Bayesian optimization. Mean accuracy of 98 percent in recommending tools is achieved for the top-1 metric. Conclusions: The model is accessed by a Galaxy API to provide researchers with recommended tools in an interactive manner using multiple user interface integrations on the European Galaxy server. High-quality and highly used tools are shown at the top of the recommendations. The scripts and data to create the recommendation system are available under MIT license at https://github.com/anuprulez/galaxy_tool_recommendation.

Publication note:
giaa152

Available:
pdf (1830 KB)   doi:10.1093/gigascience/giaa152   BibTeX Entry ( Kumar-tool-prediction-galaxy-2021 )

Intuitive, reproducible high-throughput molecular dynamics in Galaxy: a tutorial

Simon A. Bray, Tharindu Senapathi, Christopher B. Barnett, Björn A. Grüning

In: Journal of Cheminformatics, September 2020, 12(1)

This paper is a tutorial developed for the data analysis platform Galaxy. The purpose of Galaxy is to make high-throughput computational data analysis, such as molecular dynamics, a structured, reproducible and transparent process. In this tutorial we focus on 3 questions: How are protein-ligand systems parameterized for molecular dynamics simulation? What kind of analysis can be carried out on molecular trajectories? How can high-throughput MD be used to study multiple ligands? After finishing you will have learned about force-fields and MD parameterization, how to conduct MD simulation and analysis for a protein-ligand system, and understand how different molecular interactions contribute to the binding affinity of ligands to the Hsp90 protein.

Available:
pdf (4419 KB)   doi:10.1186/s13321-020-00451-6   BibTeX Entry ( Bray-2020-htmd )

The ChemicalToolbox: reproducible, user-friendly cheminformatics analysis on the Galaxy platform

Simon A. Bray, Xavier Lucas, Anup Kumar, Björn A. Grüning

In: Journal of Cheminformatics, 06 2020, 12(1)

Here, we introduce the ChemicalToolbox, a publicly available web server for performing cheminformatics analysis. The ChemicalToolbox provides an intuitive, graphical interface for common tools for downloading, filtering, visualizing and simulating small molecules and proteins. The ChemicalToolbox is based on Galaxy, an open-source web-based platform which enables accessible and reproducible data analysis. There is already an active Galaxy cheminformatics community using and developing tools. Based on their work, we provide four example workflows which illustrate the capabilities of the ChemicalToolbox, covering assembly of a compound library, hole filling, protein-ligand docking, and construction of a quantitative structure-activity relationship (QSAR) model. These workflows may be modified and combined flexibly, together with the many other tools available, to fit the needs of a particular project. The ChemicalToolbox is hosted on the European Galaxy server and may be accessed via https://cheminformatics.usegalaxy.eu.

Available:
pdf (928 KB)   doi:10.1186/s13321-020-00442-7   BibTeX Entry ( Bray-2020-ChemicalToolbox )

CRISPRidentify: identification of CRISPR arrays using machine learning approach

Alexander Mitrofanov, Omer S. Alkhnbashi, Sergey A. Shmakov, Kira S. Makarova, Eugene V. Koonin, Rolf Backofen

In: Nucleic Acids Res, 2020

CRISPR-Cas are adaptive immune systems that degrade foreign genetic elements in archaea and bacteria. In carrying out their immune functions, CRISPR-Cas systems heavily rely on RNA components. These CRISPR (cr) RNAs are repeat-spacer units that are produced by processing of pre-crRNA, the transcript of CRISPR arrays, and guide Cas protein(s) to the cognate invading nucleic acids, enabling their destruction. Several bioinformatics tools have been developed to detect CRISPR arrays based solely on DNA sequences, but all these tools employ the same strategy of looking for repetitive patterns, which might correspond to CRISPR array repeats. The identified patterns are evaluated using a fixed, built-in scoring function, and arrays exceeding a cut-off value are reported. Here, we instead introduce a data-driven approach that uses machine learning to detect and differentiate true CRISPR arrays from false ones based on several features. Our CRISPR detection tool, CRISPRidentify, performs three steps: detection, feature extraction and classification based on manually curated sets of positive and negative examples of CRISPR arrays. The identified CRISPR arrays are then reported to the user accompanied by detailed annotation. We demonstrate that our approach identifies not only previously detected CRISPR arrays, but also CRISPR array candidates not detected by other tools. Compared to other methods, our tool has a drastically reduced false positive rate. In contrast to the existing tools, our approach not only provides the user with the basic statistics on the identified CRISPR arrays but also produces a certainty score as a practical measure of the likelihood that a given genomic region is a CRISPR array.

Available:
pdf (1177 KB)   doi:10.1093/nar/gkaa1158   pmid:33290505   BibTeX Entry ( Mitrofanov_Alkhnbashi_Shmakov-CRISP_ident_CRISP-NAR2020 )

Casboundary: automated definition of integral Cas cassettes

Victor A Padilha, Omer S Alkhnbashi, Van Dinh Tran, Shiraz A Shah, André C P L F Carvalho, Rolf Backofen

In: Bioinformatics, 11 2020

CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette boundaries is key for finding cassettes in CRISPR research field. This is often carried out by using Hidden Markov Models and manual annotation. In this article, we propose the first method able to automatically define the cassette boundaries. In addition, we present a Cas-type predictive model used by the method to assign each gene located in the region defined by a cassette’s boundaries a Cas label from a set of pre-defined Cas types. Furthermore, the proposed method can detect potentially new cas genes and decompose a cassette into its modules.We evaluate the predictive performance of our proposed method on data collected from the two most recent CRISPR classification studies. In our experiments, we obtain an average similarity of 0.86 between the predicted and expected cassettes. Besides, we achieve F-scores above 0.9 for the classification of cas genes of known types and 0.73 for the unknown ones. Finally, we conduct two additional study cases, where we investigate the occurrence of potentially new cas genes and the occurrence of module exchange between different genomes.https://github.com/BackofenLab/Casboundary.alkhanbo@informatik.uni-f% reiburg.de or backofen@informatik.uni-freiburg.deSupplementary data are available at Bioinformatics online.

Publication note:
btaa984

Available:
pdf (609 KB)   doi:10.1093/bioinformatics/btaa984   BibTeX Entry ( casboundary )

Adaptation induced by self-targeting in a type I-B CRISPR-Cas system

Aris-Edda Stachler, Julia Wortz, Omer S. Alkhnbashi, Israela Turgeman-Grott, Rachel Smith, Thorsten Allers, Rolf Backofen, Uri Gophna, Anita Marchfelder

In: Journal of Biological Chemistry, 2020, 295(39), 13502-13515

Haloferax volcanii is, to our knowledge, the only prokaryote known to tolerate CRISPR-Cas-mediated damage to its genome in the WT background; the resulting cleavage of the genome is repaired by homologous recombination restoring the WT version. In mutant Haloferax strains with enhanced self-targeting, cell fitness decreases and microhomology-mediated end joining becomes active, generating deletions in the targeted gene. Here we use self-targeting to investigate adaptation in H. volcanii CRISPR-Cas type I-B. We show that self-targeting and genome breakage events that are induced by self-targeting, such as those catalyzed by active transposases, can generate DNA fragments that are used by the CRISPR-Cas adaptation machinery for integration into the CRISPR loci. Low cellular concentrations of self-targeting crRNAs resulted in acquisition of large numbers of spacers originating from the entire genomic DNA. In contrast, high concentrations of self-targeting crRNAs resulted in lower acquisition that was mostly centered on the targeting site. Furthermore, we observed naive spacer acquisition at a low level in WT Haloferax cells and with higher efficiency upon overexpression of the Cas proteins Cas1, Cas2, and Cas4. Taken together, these findings indicate that naive adaptation is a regulated process in H. volcanii that operates at low basal levels and is induced by DNA breaks.

Available:
pdf (3063 KB)   doi:10.1074/jbc.RA120.014030   pmid:32723866   BibTeX Entry ( Stachler_Wortz_Alkhnbashi-Adapt_induc_self-JBC2020 )

Improving CLIP-seq data analysis by incorporating transcript information

Michael Uhl, Van Dinh Tran, Rolf Backofen

In: BMC genomics, 2020, 21(1), 1--8

Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows. Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.

Available:
pdf (856 KB)   doi:10.1186/s12864-020-07297-0   BibTeX Entry ( uhl2020improving )

Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants

Kira S. Makarova, Yuri I. Wolf, Jaime Iranzo, Sergey A. Shmakov, Omer S. Alkhnbashi, Stan J. J. Brouns, Emmanuelle Charpentier, David Cheng, Daniel H. Haft, Philippe Horvath, Sylvain Moineau, Francisco J. M. Mojica, David Scott, Shiraz A. Shah, Virginijus Siksnys, Michael P. Terns, Ceslovas Venclovas, Malcolm F. White, Alexander F. Yakunin, Winston Yan, Feng Zhang, Roger A. Garrett, Rolf Backofen, John van der Oost, Rodolphe Barrangou, Eugene V. Koonin

In: Nat Rev Microbiol, 2020, 18(2), 67-83

The number and diversity of known CRISPR-Cas systems have substantially increased in recent years. Here, we provide an updated evolutionary classification of CRISPR-Cas systems and cas genes, with an emphasis on the major developments that have occurred since the publication of the latest classification, in 2015. The new classification includes 2 classes, 6 types and 33 subtypes, compared with 5 types and 16 subtypes in 2015. A key development is the ongoing discovery of multiple, novel class 2 CRISPR-Cas systems, which now include 3 types and 17 subtypes. A second major novelty is the discovery of numerous derived CRISPR-Cas variants, often associated with mobile genetic elements that lack the nucleases required for interference. Some of these variants are involved in RNA-guided transposition, whereas others are predicted to perform functions distinct from adaptive immunity that remain to be characterized experimentally. The third highlight is the discovery of numerous families of ancillary CRISPR-linked genes, often implicated in signal transduction. Together, these findings substantially clarify the functional diversity and evolutionary history of CRISPR-Cas.

Available:
pdf (2443 KB)   doi:10.1038/s41579-019-0299-x   pmid:31857715   BibTeX Entry ( Makarova_Wolf_Iranzo-Evolu_class_CRISP-2020 )

A global data-driven census of Salmonella small proteins and their potential functions in bacterial virulence

Elisa Venturini, Sarah L Svensson, Sandra Maass, Rick Gelhausen, Florian Eggenhofer, Lei Li, Amy K Cain, Julian Parkhill, Dörte Becher, Rolf Backofen, Lars Barquist, Cynthia M Sharma, Alexander J Westermann, Jörg Vogel

In: microLife, 10 2020, 1(1)

Small proteins are an emerging class of gene products with diverse roles in bacterial physiology. However, a full understanding of their importance has been hampered by insufficient genome annotations and a lack of comprehensive characterization in microbes other than Escherichia coli. We have taken an integrative approach to accelerate the discovery of small proteins and their putative virulence-associated functions in Salmonella Typhimurium. We merged the annotated small proteome of Salmonella with new small proteins predicted with in silico and experimental approaches. We then exploited existing and newly generated global datasets that provide information on small open reading frame expression during infection of epithelial cells (dual RNA-seq), contribution to bacterial fitness inside macrophages (Transposon-directed insertion sequencing), and potential engagement in molecular interactions (Grad-seq). This integrative approach suggested a new role for the small protein MgrB beyond its known function in regulating PhoQ. We demonstrate a virulence and motility defect of a Salmonella delta-mgrB mutant and reveal an effect of MgrB in regulating the Salmonella transcriptome and proteome under infection-relevant conditions. Our study highlights the power of interpreting available omics datasets with a focus on small proteins, and may serve as a blueprint for a data integration-based survey of small proteins in diverse bacteria.

Publication note:
uqaa002

Available:
pdf (3366 KB)   doi:10.1093/femsml/uqaa002   BibTeX Entry ( uqaa002 )

HRIBO: high-throughput analysis of bacterial ribosome profiling data

Rick Gelhausen, Sarah L Svensson, Kathrin Froschauer, Florian Heyl, Lydia Hadjeras, Cynthia M Sharma, Florian Eggenhofer, Rolf Backofen

In: Bioinformatics, 11 2020

Ribosome profiling (Ribo-seq) is a powerful approach based on deep sequencing of cDNA libraries generated from ribosome-protected RNA fragments to explore the translatome of a cell, and is especially useful for the detection of small proteins (50–100 amino acids) that are recalcitrant to many standard biochemical and in silico approaches. While pipelines are available to analyze Ribo-seq data, none are designed explicitly for the automatic processing and analysis of data from bacteria, nor are they focused on the discovery of unannotated open reading frames (ORFs).We present HRIBO (High-throughput annotation by Ribo-seq), a workflow to enable reproducible and high-throughput analysis of bacterial Ribo-seq data. The workflow performs all required pre-processing and quality control steps. Importantly, HRIBO outputs annotation-independent ORF predictions based on two complementary bacteria-focused tools, and integrates them with additional feature information and expression values. This facilitates the rapid and high-confidence discovery of novel ORFs and their prioritization for functional characterization.HRIBO is a free and open source project available under the GPL-3 license at: https://github.com/RickGelhausen/HRIBO.

Publication note:
btaa959

Available:
pdf (1652 KB)   doi:10.1093/bioinformatics/btaa959   BibTeX Entry ( btaa959 )

NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy

Willem de Koning, Milad Miladi, Saskia Hiltemann, Astrid Heikema, John P Hays, Stephan Flemming, Marius van den Beek, Dana A Mustafa, Rolf Backofen, Björn Grüning, Andrew P Stubbs

In: GigaScience, 10 2020, 9(10)

Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers.The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxyïs a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads.A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.

Publication note:
giaa105

Available:
pdf (735 KB)   doi:10.1093/gigascience/giaa105   BibTeX Entry ( Miladi-2020-nanogalaxy )

pyGenomeTracks: reproducible plots for multivariate genomic data sets

Lucille Lopez-Delisle, Leily Rabbani, Joachim Wolff, Vivek Bhardwaj, Rolf Backofen, Björn Grüning, Fidel Ramirez, Thomas Manke

In: Bioinformatics, 08 2020

Generating publication ready plots to display multiple genomic tracks can pose a serious challenge. Making desirable and accurate figures requires considerable effort. This is usually done by hand or by using a vector graphic software.pyGenomeTracks (PGT) is a modular plotting tool that easily combines multiple tracks. It enables a reproducible and standardized generation of highly customizable and publication ready images.PGT is available through a graphical interface on https://usegalaxy.eu and through the command line. It is provided on conda via the bioconda channel, on pip and it is openly developed on github: https://github.com/deeptools/pyGenomeTracks.Supplementary data are available at Bioinformatics online.

Publication note:
btaa692

Available:
pdf (405 KB)   doi:10.1093/bioinformatics/btaa692   BibTeX Entry ( lopez2020pygenometracks )

A single-cell RNA-sequencing training and analysis suite using the Galaxy framework

Mehmet Tekman, Berenice Batut, Alexander Ostrovsky, Christophe Antoniewski, Dave Clements, Fidel Ramirez, Graham J Etherington, Hans-Rudolf Hotz, Jelle Scholtalbers, Jonathan R Manning, Lea Bellenger, Maria A Doyle, Mohammad Heydarian, Ni Huang, Nicola Soranzo, Pablo Moreno, Stefan Mautner, Irene Papatheodorou, Anton Nekrutenko, James Taylor, Daniel Blankenberg, Rolf Backofen, Bjoern Gruening

In: GigaScience, 10 2020, 9(10)

The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically driven methods needed to process and understand these ever-growing datasets.Here we outline several Galaxy workflows and learning resources for single-cell RNA-sequencing, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. The Galaxy reproducible bioinformatics framework provides tools, workflows, and trainings that not only enable users to perform 1-click 10x preprocessing but also empower them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. The downstream analysis supports a range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal, and clustering. The teaching resources cover concepts from computer science to cell biology. Access to all resources is provided at the singlecell.usegalaxy.eu portal.The reproducible and training-oriented Galaxy framework provides a sustainable high-performance computing environment for users to run flexible analyses on both 10x and alternative platforms. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy community provide a means for users to learn, publish, and teach single-cell RNA-sequencing analysis.

Publication note:
giaa102

Available:
pdf (2149 KB)   doi:10.1093/gigascience/giaa102   BibTeX Entry ( Tekman-2020-galaxy )

Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization

Joachim Wolff, Leily Rabbani, Ralf Gilsbach, Gautier Richard, Thomas Manke, Rolf Backofen, Björn A Grüning

In: Nucleic Acids Research, 04 2020, 48(W1), W177-W184

The Galaxy HiCExplorer provides a web service at https://hicexplorer.usegalaxy.eu. It enables the integrative analysis of chromosome conformation by providing tools and computational resources to pre-process, analyse and visualize Hi-C, Capture Hi-C (cHi-C) and single-cell Hi-C (scHi-C) data. Since the last publication, Galaxy HiCExplorer has been expanded considerably with new tools to facilitate the analysis of cHi-C and to provide an in-depth analysis of Hi-C data. Moreover, it supports the analysis of scHi-C data by offering a broad range of tools. With the help of the standard graphical user interface of Galaxy, presented workflows, extensive documentation and tutorials, novices as well as Hi-C experts are supported in their Hi-C data analysis with Galaxy HiCExplorer.

Available:
pdf (1020 KB)   doi:10.1093/nar/gkaa220   BibTeX Entry ( wolff2020galaxy )

Scool: a new data storage format for single-cell Hi-C data

Joachim Wolff, Nezar Abdennur, Rolf Backofen, Björn Grüning

In: Bioinformatics, 11 2020

Single-cell Hi-C research currently lacks an efficient, easy to use and shareable data storage format. Recent studies have used a variety of sub-optimal solutions: publishing raw data only, text based interaction matrices, or reusing established Hi-C storage formats for single interaction matrices. These approaches are storage and pre-processing intensive, require long labour time and are often error-prone.The single-cell cooler file format (scool) provides an efficient, user-friendly and storage-saving approach for single-cell Hi-C data. It is a flavour of the established cooler format and guarantees stable API support.The single-cell cooler format is part of the cooler file format as of API version 0.8.9. It is available via pip, conda and github: https://github.com/mirnylab/cooler.Supplementary data are available at Bioinformatics online.

Publication note:
btaa924

Available:
pdf (274 KB)   doi:10.1093/bioinformatics/btaa924   BibTeX Entry ( wolff2020scool )

Nephronophthisis gene products display RNA-binding properties and are recruited to stress granules

Luisa Estrada Mallarino, Christina Engel, Ibrahim Avsar Ilik, Daniel Maticzka, Florian Heyl, Barbara Müller, Toma A Yakulov, Jörn Dengjel, Rolf Backofen, Asifa Akhtar

In: Scientific reports, 2020, 10(1), 1--14

Mutations of cilia-associated molecules cause multiple developmental defects that are collectively termed ciliopathies. However, several ciliary proteins, involved in gating access to the cilium, also assume localizations at other cellular sites including the nucleus, where they participate in DNA damage responses to maintain tissue integrity. Molecular insight into how these molecules execute such diverse functions remains limited. A mass spectrometry screen for ANKS6-interacting proteins suggested an involvement of ANKS6 in RNA processing and/or binding. Comparing the RNA-binding properties of the known RNA-binding protein BICC1 with the three ankyrin-repeat proteins ANKS3, ANKS6 (NPHP16) and INVERSIN (NPHP2) confirmed that certain nephronophthisis (NPH) family members can interact with RNA molecules. We also observed that BICC1 and INVERSIN associate with stress granules in response to translational inhibition. Furthermore, BICC1 recruits ANKS3 and ANKS6 into TIA-1-positive stress granules after exposure to hippuristanol. Our findings uncover a novel function of NPH family members, and provide further evidence that NPH family members together with BICC1 are involved in stress responses to maintain tissue and organ integrity.

Available:
doi:10.1038/s41598-020-72905-8   BibTeX Entry ( Mallarino2020nephro )

Temporospatial distribution and transcriptional profile of retinal microglia in the oxygen-induced retinopathy mouse model

Myriam Boeck, Adrian Thien, Julian Wolf, Nora Hagemeyer, Yannik Laich, Dilmurat Yusuf, Rolf Backofen, Peipei Zhang, Stefaniya Boneva, Andreas Stahl, Ingo Hilgendorf, Hansjürgen Agostini, Marco Prinz, Peter Wieghofer, Günther Schlunck, Anja Schlecht, Clemens Lange

September 2020

Myeloid cells such as resident retinal microglia (MG) or infiltrating blood-derived macrophages (MÏ•) accumulate in areas of retinal ischemia and neovascularization (RNV) and modulate neovascular eye disease. Their temporospatial distribution and biological function in this process, however, remain unclarified. Using state-of-the-art methods, including cell-specific reporter mice and high-throughput RNA sequencing (RNA Seq), this study determined the extent of MG proliferation and MÏ• infiltration in areas with retinal ischemia and RNV in Cx3cr1CreERT2 :Rosa26-tdTomato mice and examined the transcriptional profile of MG in the mouse model of oxygen-induced retinopathy (OIR). For RNA Seq, tdTomato-positive retinal MG were sorted by flow cytometry followed by Gene ontology (GO) cluster analysis. Furthermore, intraperitoneal injections of the cell proliferation marker 5-ethynyl-2'-deoxyuridine (EdU) were performed from postnatal day (p) 12 to p16. We found that MG is the predominant myeloid cell population while MÏ• rarely appears in areas of RNV. Thirty percent of retinal MG in areas of RNV were EdU-positive indicating a considerable local MG cell expansion. GO cluster analysis revealed an enrichment of clusters related to cell division, tubulin binding, ATPase activity, protein kinase regulatory activity, and chemokine receptor binding in MG in the OIR model compared to untreated controls. In conclusion, activated retinal MG alter their transcriptional profile, exhibit considerable proliferative ability and are by far the most frequent myeloid cell population in areas of ischemia and RNV in the OIR model thus presenting a potential target for future therapeutic approaches.

Available:
pdf (12968 KB)   doi:10.1002/glia.23810   BibTeX Entry ( Yusuf-Tempo_transcript_retinal_microglia_Glia2020 )

Generation of pure monocultures of human microglia-like cells from induced pluripotent stem cells

Poulomi Banerjee, Evdokia Paza, Emma M Perkins, Owen G James, Boyd Kenkhuis, Amy F Lloyd, Karen Burr, David Story, Dilmurat Yusuf, Xin He, Rolf Backofen, Owen Dando, Siddharthan Chandran, Josef Priller

In: Stem Cell Res., October 2020, 49, 102046

Microglia are resident tissue macrophages of the central nervous system (CNS) that arise from erythromyeloid progenitors during embryonic development. They play essential roles in CNS development, homeostasis and response to disease. Since microglia are difficult to procure from the human brain, several protocols have been developed to generate microglia-like cells from human induced pluripotent stem cells (hiPSCs). However, some concerns remain over the purity and quality of in vitro generated microglia. Here, we describe a new protocol that does not require co-culture with neural cells and yields cultures of 100\% P2Y12+ 95\% TMEM119+ ramified human microglia-like cells (hiPSC-MG). In the presence of neural precursor cell-conditioned media, hiPSC-MG expressed high levels of human microglia signature genes, including SALL1, CSF1R, P2RY12, TMEM119, TREM2, HEXB and SIGLEC11, as revealed by whole-transcriptome analysis. Stimulation of hiPSC-MG with lipopolysaccharide resulted in downregulation of P2Y12 expression, induction of IL1B mRNA expression and increase in cell capacitance. HiPSC-MG were phagocytically active and maintained their cell identity after transplantation into murine brain slices and human brain spheroids. Together, our new protocol for the generation of microglia-like cells from human iPSCs will facilitate the study of human microglial function in health and disease.

Available:
pdf (9171 KB)   doi:10.1016/j.scr.2020.102046   BibTeX Entry ( Yusuf-microglia_stem_cell_StemCellRes2020 )

CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems

Victor A. Padilha, Omer S. Alkhnbashi, Shiraz A. Shah, Andre C. P. L. F. de Carvalho, Rolf Backofen

In: Gigascience, 2020, 9(6)

BACKGROUND: CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems and for finding new candidates for genome engineering in eukaryotic models. RESULTS: We introduce CRISPRcasIdentifier, a new machine learning-based tool that combines regression and classification models for the prediction of potentially missing proteins in instances of CRISPR-Cas systems and the prediction of their respective subtypes. In contrast to other available tools, CRISPRcasIdentifier can both detect cas genes and extract potential association rules that reveal functional modules for CRISPR-Cas systems. In our experimental benchmark on the most recently published and comprehensive CRISPR-Cas system dataset, CRISPRcasIdentifier was compared with recent and state-of-the-art tools. According to the experimental results, CRISPRcasIdentifier presented the best Cas protein identification and subtype classification performance. CONCLUSIONS: Overall, our tool greatly extends the classification of CRISPR cassettes and, for the first time, predicts missing Cas proteins and association rules between Cas proteins. Additionally, we investigated the properties of CRISPR subtypes. The proposed tool relies not only on the knowledge of manual CRISPR annotation but also on models trained using machine learning.

Available:
pdf (755 KB)   doi:10.1093/gigascience/giaa062   pmid:32556168   BibTeX Entry ( Padilha_Alkhnbashi_Shah-CRISP_Machi_learn-2020 )

Heterogeneous networks integration for disease-gene prioritization with node kernels

Van Dinh Tran, Alessandro Sperduti, Rolf Backofen, Fabrizio Costa

In: Bioinformatics, 2020, 36(9), 2649-2656

MOTIVATION: The identification of disease-gene associations is a task of fundamental importance in human health research. A typical approach consists in first encoding large gene/protein relational datasets as networks due to the natural and intuitive property of graphs for representing objects' relationships and then utilizing graph-based techniques to prioritize genes for successive low-throughput validation assays. Since different types of interactions between genes yield distinct gene networks, there is the need to integrate different heterogeneous sources to improve the reliability of prioritization systems. RESULTS: We propose an approach based on three phases: first, we merge all sources in a single network, then we partition the integrated network according to edge density introducing a notion of edge type to distinguish the parts and finally, we employ a novel node kernel suitable for graphs with typed edges. We show how the node kernel can generate a large number of discriminative features that can be efficiently processed by linear regularized machine learning classifiers. We report state-of-the-art results on 12 disease-gene associations and on a time-stamped benchmark containing 42 newly discovered associations. AVAILABILITY AND IMPLEMENTATION: Source code: https://github.com/dinhinfotech/DiGI.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
pdf (464 KB)   doi:10.1093/bioinformatics/btaa008   pmid:31990289   BibTeX Entry ( Tran_Sperduti_Backofen-Heter_netwo_integ-2020 )

The locality dilemma of Sankoff-like RNA alignments

Teresa Müller, Milad Miladi, Frank Hutter, Ivo Hofacker, Sebastian Will, Rolf Backofen

In: Bioinformatics, 2020

Motivation: Elucidating the functions of non-coding RNAs by homology has been strongly limited due to fundamental computational and modeling issues. While existing simultaneous alignment and folding (SAF) algorithms successfully align homologous RNAs with precisely known boundaries (global SAF), the more pressing problem of identifying new classes of homologous RNAs in the genome (local SAF) is intrinsically more difficult and much less understood. Typically, the length of local alignments is strongly overestimated and alignment boundaries are dramatically mispredicted. We hypothesize that local SAF approaches are compromised this way due to a score bias, which is caused by the contribution of RNA structure similarity to their overall alignment score. Results: In the light of this hypothesis, we study pairwise local SAF for the first time systematically— based on a novel local RNA alignment benchmark set and quality measure. First, we vary the relative influence of structure similarity compared to sequence similarity. Putting more emphasis on the structure component leads to overestimating the length of local alignments. This clearly shows the bias of current scores and strongly hints at the structure component as its origin. Second, we study the interplay of several important scoring parameters by learning parameters for local and global SAF. The divergence of these optimized parameter sets underlines the fundamental obstacles for local SAF. Thirdly, by introducing a position-wise correction term in local SAF, we constructively solve its principal issues.

Publication note:
(accepted for publication)

Available:
pdf (704 KB)   supplement.pdf (747 KB)   BibTeX Entry ( Mueller-LocalAlign-2020 )

CopomuS - ranking compensatory mutations to guide RNA-RNA interaction verification experiments

Martin Raden, Fabio Gutmann, Michael Uhl, Rolf Backofen

In: International Journal of Molecular Sciences, 2020, 21(11), 3852

In silico RNA-RNA interaction prediction is widely applied to identify putative interaction partners and to assess interaction details in base pair resolution. To verify specific interactions, in vitro evidence can be obtained via compensatory mutation experiments. Unfortunately, the selection of compensatory mutations is non-trivial and typically based on subjective ad hoc decisions. To support the decision process, we introduce our COmPensatOry MUtation Selector CopomuS. CopomuS evaluates the effects of mutations on RNA-RNA interaction formation using a set of objective criteria, and outputs a reliable ranking of compensatory mutation candidates. For RNA-RNA interaction assessment, the state-of-the-art IntaRNA prediction tool is applied. We investigate characteristics of successfully verified RNA-RNA interactions from the literature, which guided the design of CopomuS. Finally, we evaluate its performance based on experimentally validated compensatory mutations of prokaryotic sRNAs and their target mRNAs. CopomuS predictions highly agree with known results, making it a valuable tool to support the design of verification experiments for RNA-RNA interactions. It is part of the IntaRNA package and available as stand-alone webserver for ad hoc application.

Publication note:
(This article belongs to the Special Issue RNA Structure Prediction)

Available:
pdf (565 KB)   doi:10.3390/ijms21113852   BibTeX Entry ( Raden-2020-CopomuS )

MutaRNA: analysis and visualization of mutation-induced changes in RNA structure

Milad Miladi, Martin Raden, Sven Diederichs, Rolf Backofen

In: Nucleic Acids Research, 05 2020

RNA molecules fold into complex structures as a result of intramolecular interactions between their nucleotides. The function of many non-coding RNAs and some cis-regulatory elements of messenger RNAs highly depends on their fold. Single-nucleotide variants (SNVs) and other types of mutations can disrupt the native function of an RNA element by altering its base pairing pattern. Identifying the effect of a mutation on an RNA’s structure is, therefore, a crucial step in evaluating the impact of mutations on the post-transcriptional regulation and function of RNAs within the cell. Even though a single nucleotide variation can have striking impacts on the structure formation, interpreting and comparing the impact usually needs expertise and meticulous efforts. Here, we present MutaRNA, a web server for visualization and interpretation of mutation-induced changes on the RNA structure in an intuitive and integrative fashion. To this end, probabilities of base pairing and position-wise unpaired probabilities of wildtype and mutated RNA sequences are computed and compared. Differential heatmap-like dot plot representations in combination with circular plots and arc diagrams help to identify local structure abberations, which are otherwise hidden in standard outputs. Eventually, MutaRNA provides a comprehensive and comparative overview of the mutation-induced changes in base pairing potentials and accessibility. The MutaRNA web server is freely available at http://rna.informatik.uni-freiburg.de/MutaRNA.

Publication note:
(MM and MR contributed equally)

Available:
pdf (1184 KB)   doi:10.1093/nar/gkaa331   BibTeX Entry ( Miladi-2020-MutaRNA )

The potential of intra-annual density information for crossdating of short tree-ring series

Martin Raden, Alexander Mattheis, Hans-Peter Kahle, Heinrich Spiecker, Rolf Backofen

In: Dendrochronologia, 2020, 60, 125679

The crossdating of tree-ring series is typically based on tree-ring width sequences, which is a crude abstraction of the growth signal stored in tree rings. In contrast, intra-annual wood density data allows a much more detailed comparison of wood growth processes and new measurement techniques scale well to measure large amounts of samples. Thus, chronologies of intra-annual densitometric curves can be built. Here, we investigate to what extent intra-annual wood density information can improve crossdating. We evaluate different approaches on a data set of Norway spruce trees (Picea abies) and compare the results to standard methods that are based on ring width or maximum density. Our results show that intra-annual densitometric data indeed increases crossdating success rate notably for short tree ring series that cover less than 25 years.

Available:
pdf (395 KB)   supplement.zip (6173 KB)   doi:10.1016/j.dendro.2020.125679   BibTeX Entry ( Raden-crossdating-2019 )

How to do RNA-RNA interaction prediction? A use-case driven handbook using IntaRNA

Martin Raden, Milad Miladi

In: Methods in Molecular Biology, 2020

Computational prediction of RNA-RNA interactions (RRI) is a central methodology for the specific investigation of inter-molecular RNA interactions and regulatory effects of non-coding RNAs like eukaryotic microRNAs or prokaryotic small RNAs. Available methods can be classified according to their underlying prediction strategies, each implicating specific capabilities and restrictions often not transparent to the non-expert user. Within this work, we review seven classes of RRI prediction strategies and discuss advantages and limitations of respective tools, since such knowledge is essential for selecting the right tool in the first place. Currently, accessibility-based approaches provide the most reliable RRI predictions. Thus, we discuss how IntaRNA, one of the state-of-the-art accessibility-based tools, can be applied in various use cases of computational RRI prediction. Detailed hands-on examples both for specific RRI prediction as well as large-scale target prediction are provided to illustrate the flexibility and capabilities of IntaRNA. Each example uses realistic data from the literature and is accompanied by instructions how to interpret respective results to enable non-expert users to comprehensively understand and utilize IntaRNAs features for RRI predictions.

Publication note:
(under review)

Available:
BibTeX Entry ( Raden-IntaRNA-handson-2019 )

The impact of various seed, accessibility and interaction constraints on sRNA target prediction - a systematic assessment

Martin Raden, Teresa Müller, Stefan Mautner, Rick Gelhausen, Rolf Backofen

In: BMC Bioinformatics, 2020, 21, 15

Seed and accessibility constraints are core features to enable highly accurate sRNA target screens based on RNA-RNA interaction prediction. Currently, available tools provide different (sets of) constraints and default parameter sets. Thus, it is hard to impossible for users to estimate the influence of individual restrictions on the prediction results. Here, we present a systematic assessment of the impact of established and new constraints on sRNA target prediction both on a qualitative as well as computational level. This is done exemplarily based on the performance of IntaRNA, one of the most exact sRNA target prediction tools. IntaRNA provides various ways to constrain considered seed interactions, e.g. based on seed length, its accessibility, minimal unpaired probabilities, or energy thresholds, beside analogous constraints for the overall interaction. Thus, our results reveal the impact of individual constraints and their combinations. This provides both a guide for users what is important and recommendations for existing and upcoming sRNA target prediction approaches. We show on a large sRNA target screen benchmark data set that only by altering the parameter set, IntaRNA recovers 30 percent more verified interactions while becoming 5-times faster. This exemplifies the potential of seed, accessibility and interaction constraints for sRNA target prediction.

Available:
supplement.pdf (1952 KB)   pdf (839 KB)   doi:10.1186/s12859-019-3143-4   BibTeX Entry ( Raden-IntaRNA-benchmark-2019 )

pourRNA - a time- and memory-efficient approach for the guided exploration of RNA energy landscapes

Gregor Entzian, Martin Raden

In: Bioinformatics, 2020, 36(2), 462-469

Motivation: The folding dynamics of RNAs are typically studied via coarse-grained models of the underlying energy landscape to face the exponential growths of the RNA secondary structure space. Still, studies of exact folding kinetics based on gradient basin abstractions are currently limited to short sequence lengths due to vast memory requirements. In order to compute exact transition rates between gradient basins, state-of-the-art approaches apply global flooding schemes that require to memorize the whole structure space at once. pourRNA tackles this problem via local flooding techniques where memorization is limited to the structure ensembles of individual gradient basins. Results: Compared to the only available tool for exact gradient basin based macro state transition rates (namely barriers), pourRNA computes the same exact transition rates up to ten times faster and requires two orders of magnitude less memory for sequences that are still computationally accessible for exhaustive enumeration. Parallelized computation as well as additional heuristics further speed up computations while still producing high quality transition model approximations. The introduced heuristics enable a guided trade-off between model quality and required computational resources. We introduce and evaluate a macroscopic direct-path heuristics to efficiently compute refolding energy barrier estimations for the co-transcriptionally trapped RNA sv11 of length 115nt. Finally, we also show how pourRNA can be used to identify folding funnels and their respective energetically lowest minima. Availability: pourRNA is freely available at https://github.com/ViennaRNA/pourRNA

Available:
supplement.pdf (1230 KB)   pdf (931 KB)   doi:10.1093/bioinformatics/btz583   BibTeX Entry ( Entzian-2019 )

Biomolecular Reaction and Interaction Dynamics Global Environment (BRIDGE)

Tharindu Senapathi, Simon Bray, Christopher B Barnett, Björn Gröning, Kevin J Naidoo

In: Bioinformatics, 02 2019, 35(18), 3508-3509

The pathway from genomics through proteomics and onto a molecular description of biochemical processes makes the discovery of drugs and biomaterials possible. A research framework common to genomics and proteomics is needed to conduct biomolecular simulations that will connect biological data to the dynamic molecular mechanisms of enzymes and proteins. Novice biomolecular modelers are faced with the daunting task of complex setups and a myriad of possible choices preventing their use of molecular simulations and their ability to conduct reliable and reproducible computations that can be shared with collaborators and verified for procedural accuracy.We present the foundations of Biomolecular Reaction and Interaction Dynamics Global Environment (BRIDGE) developed on the Galaxy platform that makes possible fundamental molecular dynamics of proteins through workflows and pipelines via commonly used packages, such as NAMD, GROMACS and CHARMM. BRIDGE can be used to set up and simulate biological macromolecules, perform conformational analysis from trajectory data and conduct data analytics of large scale protein motions using statistical rigor. We illustrate the basic BRIDGE simulation and analytics capabilities on a previously reported CBH1 protein simulation.Publicly available at https://github.com/scientificomputing/BRIDGE and https://usegalaxy.euSupplementary data are available at Bioinformatics online.

Available:
pdf (253 KB)   doi:10.1093/bioinformatics/btz107   BibTeX Entry ( Bray-2019-BRIDGE )

The RNA workbench 2.0: next generation RNA data analysis

Jörg Fallmann, Pavankumar Videm, Andrea Bagnacani, Berenice Batut, Maria A. Doyle, Tomas Klingstrom, Florian Eggenhofer, Peter F. Stadler, Rolf Backofen, Bjorn Grüning

In: Nucleic Acids Res, 2019, 47(W1), W511-W515

RNA has become one of the major research topics in molecular biology. As a central player in key processes regulating gene expression, RNA is in the focus of many efforts to decipher the pathways that govern the transition of genetic information to a fully functional cell. As more and more researchers join this endeavour, there is a rapidly growing demand for comprehensive collections of tools that cover the diverse layers of RNA-related research. However, increasing amounts of data, from diverse types of experiments, addressing different aspects of biological questions need to be consolidated and integrated into a single framework. Only then is it possible to connect findings from e.g. RNA-Seq experiments and methods for e.g. target predictions. To address these needs, we present the RNA Workbench 2.0 , an updated online resource for RNA related analysis. With the RNA Workbench we created a comprehensive set of analysis tools and workflows that enables researchers to analyze their data without the need for sophisticated command-line skills. This update takes the established framework to the next level, providing not only a containerized infrastructure for analysis, but also a ready-to-use platform for hands-on training, analysis, data exploration, and visualization. The new framework is available at https://rna.usegalaxy.eu , and login is free and open to all users. The containerized version can be found at https://github.com/bgruening/galaxy-rna-workbench.

Available:
pdf (820 KB)   doi:10.1093/nar/gkz353   pmid:31073612   BibTeX Entry ( Fallmann_Videm_Bagnacani-The_RNA_workb-NAR2019 )

FLASH: ultra-fast protocol to identify RNA–protein interactions in cells

Ibrahim Avsar Ilik, Tugce Aktas, Daniel Maticzka, Rolf Backofen, Asifa Akhtar

In: Nucleic Acids Research, 12 2019, 48(3), e15-e15

Determination of the in vivo binding sites of RNA-binding proteins (RBPs) is paramount to understanding their function and how they affect different aspects of gene regulation. With hundreds of RNA-binding proteins identified in human cells, a flexible, high-resolution, high-throughput, highly multiplexible and radioactivity-free method to determine their binding sites has not been described to date. Here we report FLASH (Fast Ligation of RNA after some sort of Affinity Purification for High-throughput Sequencing), which uses a special adapter design and an optimized protocol to determine protein–RNA interactions in living cells. The entire FLASH protocol, starting from cells on plates to a sequencing library, takes 1.5 days. We demonstrate the flexibility, speed and versatility of FLASH by using it to determine RNA targets of both tagged and endogenously expressed proteins under diverse conditions in vivo.

Available:
pdf (4567 KB)   doi:10.1093/nar/gkz1141   BibTeX Entry ( Ilik-2020-flash )

Noncoding deletions reveal a gene that is critical for intestinal function

Danit Oz-Levi, Tsviya Olender, Ifat Bar-Joseph, Yiwen Zhu, Dina Marek-Yagel, Iros Barozzi, Marco Osterwalder, Anna Alkelai, Elizabeth K Ruzzo, Yujun Han

In: Nature, 2019, 571(7763), 107--111

Large-scale genome sequencing is poised to provide a substantial increase in the rate of discovery of disease-associated mutations, but the functional interpretation of such mutations remains challenging. Here we show that deletions of a sequence on human chromosome 16 that we term the intestine-critical region (ICR) cause intractable congenital diarrhoea in infants. Reporter assays in transgenic mice show that the ICR contains a regulatory sequence that activates transcription during the development of the gastrointestinal system. Targeted deletion of the ICR in mice caused symptoms that recapitulated the human condition. Transcriptome analysis revealed that an unannotated open reading frame (Percc1) flanks the regulatory sequence, and the expression of this gene was lost in the developing gut of mice that lacked the ICR. Percc1-knockout mice displayed phenotypes similar to those observed upon ICR deletion in mice and patients, whereas an ICR-driven Percc1 transgene was sufficient to rescue the phenotypes found in mice that lacked the ICR. Together, our results identify a gene that is critical for intestinal function and underscore the need for targeted in vivo studies to interpret the growing number of clinical genetic findings that do not affect known protein-coding genes.

Available:
doi:10.1038/s41586-019-1312-2   BibTeX Entry ( Tekman-2019-noncode )

Progress Towards Graph Optimization: Efficient Learning of Vector to Graph Space Mappings

Stefan Mautner, Rolf Backofen, Fabrizio Costa

In: ESANN 2019 - Proceedings, 2019

Optimization in vector space domains is well understood. However, in high dimensional settings or when dealing with structured data such as sequences and graphs, optimization becomes difficult. A possible strategy is to map graphs to vector codes and use machine learning to learn a map from codes back to graphs. This in turn allows to employ standard optimization techniques over vectors to optimize graphs. Here we propose an approach to invert a vector mapping based on a combination of graph kernels and graph grammars. We evaluate the proposed approach in an artificial setup and on real molecular graphs.

Available:
pdf (1503 KB)   BibTeX Entry ( mautner_esann19 )

Universal readout for graph convolutional neural networks

Nicolò Navarin, Dinh Van Tran, Alessandro Sperduti

In: 2019 International Joint Conference on Neural Networks (IJCNN), 2019, 1--7

Several machine learning problems can be naturally defined over graph data. Recently, many researchers have been focusing on the definition of neural networks for graphs. The core idea is to learn a hidden representation for the graph vertices, with a convolutive or recurrent mechanism. When considering discriminative tasks on graphs, such as classification or regression, one critical component to design is the readout function, i.e. the mapping from the set of vertex representations to a fixed-size vector (or the output). Different approaches have been presented in literature, but recent approaches tend to be complex, making the training of the whole network harder. In this paper, we frame the problem in the setting of learning over sets. Adopting recently proposed theorems over functions defined on sets, we propose a simple but powerful formulation for a readout layer that can encode or approximate arbitrarily well any continuous permutation-invariant function over sets. Experimental results on real-world graph datasets show that, compared to other approaches, the proposed readout architecture can improve the predictive performance of Graph Neural Networks while being computationally more efficient.

Available:
BibTeX Entry ( navarin2019universal )

GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering

Milad Miladi, Eteri Sokhoyan, Torsten Houwaart, Steffen Heyne, Fabrizio Costa, Björn Grüning, Rolf Backofen

In: GigaScience, 12 2019, 8(12)

RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available.Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs.By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR’s 3′ untranslated region that contains multiple binding stem-loops that are evolutionary conserved.

Publication note:
giz150

Available:
supplement.pdf (971 KB)   pdf (2902 KB)   doi:10.1093/gigascience/giz150   BibTeX Entry ( Miladi-GraphClust2-2019 )

IntaRNAhelix - Composing RNA-RNA interactions from stable inter-molecular helices boosts bacterial sRNA target prediction

Rick Gelhausen, Sebastian Will, Ivo L. Hofacker, Rolf Backofen, Martin Raden

In: Journal of Bioinformatics and Computational Biology, 2019, 17(5), 1940009

Efficient computational tools for the identification of putative target RNAs regulated by prokaryotic sRNAs rely on thermodynamic models of RNA secondary structures. While they typically predict RNA-RNA interaction complexes accurately, they yield many highly-ranked false positives in target screens. One obvious source of this low specificity appears to be the disability of current secondary-structure-based models to reflect steric constraints, which nevertheless govern the kinetic formation of RNA-RNA interactions. For example, often- even thermodynamically favorable -extensions of short initial kissing hairpin interactions are kinetically prohibited, since this would require unwinding of intra-molecular helices as well as sterically impossible bending of the interaction helix. Another source is the consideration of instable and thus unlikely subinteractions that enable better scoring of longer interactions. In consequence, the efficient prediction methods that do not consider such effects show a high false positive rate. To increase the prediction accuracy we devise IntaRNAhelix, a dynamic programming algorithm that length-restricts the runs of consecutive inter-molecular base pairs (perfect canonical stackings), which we hypothesize to implicitely model the steric and kinetic effects. The novel method is implemented by extending the state-of-the-art tool IntaRNA. Our comprehensive bacterial sRNA target prediction benchmark demonstrates significant improvements of the prediction accuracy and enables more than 40-times faster computations. These results indicate- supporting our hypothesis -that stable helix composition increases the accuracy of interaction prediction models compared to the current state-of-the-art approach.

Available:
pdf (612 KB)   doi:10.1142/S0219720019400092   BibTeX Entry ( Gelhausen-IntaRNAhelix-2019 )

A pan-cancer analysis of synonymous mutations

Yogita Sharma, Milad Miladi, Sandeep Dukare, Karine Boulay, Maiwen Caudron-Herger, Matthias Gross, Rolf Backofen, Sven Diederichs

In: Nature communications, 2019, 10(1), 2569

Synonymous mutations have been viewed as silent mutations, since they only affect the DNA and mRNA, but not the amino acid sequence of the resulting protein. Nonetheless, recent studies suggest their significant impact on splicing, RNA stability, RNA folding, translation or co-translational protein folding. Hence, we compile 659194 synonymous mutations found in human cancer and characterize their properties. We provide the user-friendly, comprehensive resource for synonymous mutations in cancer, SynMICdb ( http://SynMICdb.dkfz.de ), which also contains orthogonal information about gene annotation, recurrence, mutation loads, cancer association, conservation, alternative events, impact on mRNA structure and a SynMICdb score. Notably, synonymous and missense mutations are depleted at the 5'-end of the coding sequence as well as at the ends of internal exons independent of mutational signatures. For patient-derived synonymous mutations in the oncogene KRAS, we indicate that single point mutations can have a relevant impact on expression as well as on mRNA secondary structure.

Available:
pdf (1715 KB)   doi:10.1038/s41467-019-10489-2   pmid:31189880   BibTeX Entry ( Sharma_Miladi_Dukare-pan_analy_synon-2019 )

Fast and Accurate Structure Probability Estimation for Simultaneous Alignment and Folding of RNAs

Milad Miladi, Martin Raden, Sebastian Will, Rolf Backofen

In: Katharina T. Huber, Dan Gusfield, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019), Leibniz International Proceedings in Informatics (LIPIcs), 2019, 143, 14:1--14:13

Motivation: Simultaneous alignment and folding (SAF) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of O(n6) in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. Results: Here we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. Supporting modifications with surgical precision, this model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA. Pankov is developed as a branch of the LocARNA package and available at https://github.com/mmiladi/Pankov.

Available:
pdf (1582 KB)   doi:10.4230/LIPIcs.WABI.2019.14   BibTeX Entry ( Miladi-Pankov-2019 )

ShaKer: RNA SHAPE prediction using graph kernel

Stefan Mautner, Soheila Montaseri, Milad Miladi, Martin Raden, Fabrizio Costa, Rolf Backofen

In: Bioinformatics, 07 2019, 35(14), i354-i359

SHAPE experiments are used to probe the structure of RNA molecules. We present ShaKer to predict SHAPE data for RNA using a graph-kernel-based machine learning approach that is trained on experimental SHAPE information. While other available methods require a manually curated reference structure, ShaKer predicts reactivity data based on sequence input only and by sampling the ensemble of possible structures. Thus, ShaKer is well placed to enable experiment-driven, transcriptome-wide SHAPE data prediction to enable the study of RNA structuredness and to improve RNA structure and RNA-RNA interaction prediction. For performance evaluation we use accuracy and accessibility comparing to experimental SHAPE data and competing methods. We can show that Shaker outperforms its competitors and is able to predict high quality SHAPE annotations even when no reference structure is provided. ShaKer is freely available at https://github.com/BackofenLab/ShaKer

Available:
pdf (459 KB)   doi:10.1093/bioinformatics/btz395   BibTeX Entry ( Mautner-SHAKER-2019 )

The RNA workbench 2.0: next generation RNA data analysis

Jörg Fallmann, Pavankumar Videm, Andrea Bagnacani, Berenice Batut, Maria A Doyle, Tomas Klingstrom, Florian Eggenhofer, Peter F Stadler, Rolf Backofen, Björn Grüning

In: Nucleic Acids Research, 05 2019

RNA has become one of the major research topics in molecular biology. As a central player in key processes regulating gene expression, RNA is in the focus of many efforts to decipher the pathways that govern the transition of genetic information to a fully functional cell. As more and more researchers join this endeavour, there is a rapidly growing demand for comprehensive collections of tools that cover the diverse layers of RNA-related research. However, increasing amounts of data, from diverse types of experiments, addressing different aspects of biological questions need to be consolidated and integrated into a single framework. Only then is it possible to connect findings from e.g. RNA-Seq experiments and methods for e.g. target predictions. To address these needs, we present the RNA Workbench 2.0, an updated online resource for RNA related analysis. With the RNA Workbench we created a comprehensive set of analysis tools and workflows that enables researchers to analyze their data without the need for sophisticated command-line skills. This update takes the established framework to the next level, providing not only a containerized infrastructure for analysis, but also a ready-to-use platform for hands-on training, analysis, data exploration, and visualization. The new framework is available at https://rna.usegalaxy.eu, and login is free and open to all users. The containerized version can be found at https://github.com/bgruening/galaxy-rna-workbench.

Available:
pdf (821 KB)   doi:10.1093/nar/gkz353   BibTeX Entry ( fallmann_2019_workbench2 )

CRISPR-Cas bioinformatics

Omer S. Alkhnbashi, Tobias Meier, Alexander Mitrofanov, Rolf Backofen, Bjorn Voss

In: Methods, 2019

Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated proteins (Cas) are essential genetic elements in many archaeal and bacterial genomes, playing a key role in a prokaryote adaptive immune system against invasive foreign elements. In recent years, the CRISPR-Cas system has also been engineered to facilitate target gene editing in eukaryotic genomes. Bioinformatics played an essential role in the detection and analysis of CRISPR systems and here we review the bioinformatics-based efforts that pushed the field of CRISPR-Cas research further. We discuss the bioinformatics tools that have been published over the last few years and, finally, present the most popular tools for the design of CRISPR-Cas9 guides.

Available:
doi:10.1016/j.ymeth.2019.07.013   pmid:31326596   BibTeX Entry ( Alkhnbashi_Meier_Mitrofanov-CRISP_bioin-2019 )

Cross-cleavage activity of Cas6b in crRNA processing of two different CRISPR-Cas systems in Methanosarcina mazei Go1

Lisa Nickel, Andrea Ulbricht, Omer S. Alkhnbashi, Konrad U. Forstner, Liam Cassidy, Katrin Weidenbach, Rolf Backofen, Ruth A. Schmitz

In: RNA Biol, 2019, 16(4), 492-503

The clustered regularly interspaced short palindromic repeat (CRISPR) system is a prokaryotic adaptive defense system against foreign nucleic acids. In the methanoarchaeon Methanosarcina mazei Go1, two types of CRISPR-Cas systems are present (type I-B and type III-C). Both loci encode a Cas6 endonuclease, Cas6b-IB and Cas6b-IIIC, typically responsible for maturation of functional short CRISPR RNAs (crRNAs). To evaluate potential cross cleavage activity, we biochemically characterized both Cas6b proteins regarding their crRNA binding behavior and their ability to process pre-crRNA from the respective CRISPR array in vivo. Maturation of crRNA was studied in the respective single deletion mutants by northern blot and RNA-Seq analysis demonstrating that in vivo primarily Cas6b-IB is responsible for crRNA processing of both CRISPR arrays. Tentative protein level evidence for the translation of both Cas6b proteins under standard growth conditions was detected, arguing for different activities or a potential non-redundant role of Cas6b-IIIC within the cell. Conservation of both Cas6 endonucleases was observed in several other M. mazei isolates, though a wide variety was displayed. In general, repeat and leader sequence conservation revealed a close correlation in the M. mazei strains. The repeat sequences from both CRISPR arrays from M. mazei Go1 contain the same sequence motif with differences only in two nucleotides. These data stand in contrast to all other analyzed M. mazei isolates, which have at least one additional CRISPR array with repeats belonging to another sequence motif. This conforms to the finding that Cas6b-IB is the crucial and functional endonuclease in M. mazei Go1. Abbreviations: sRNA: small RNA; crRNA: CRISPR RNA; pre-crRNAs: Precursor CRISPR RNA; CRISPR: clustered regularly interspaced short palindromic repeats; Cas: CRISPR associated; nt: nucleotide; RNP: ribonucleoprotein; RBS: ribosome binding site.

Available:
doi:10.1080/15476286.2018.1514234   pmid:30153081   BibTeX Entry ( Nickel_Ulbricht_Alkhnbashi-Cross_activ_Cas-2019 )

The nuts and bolts of the Haloferax CRISPR-Cas system I-B

Lisa-Katharina Maier, Aris-Edda Stachler, Jutta Brendel, Britta Stoll, Susan Fischer, Karina A. Haas, Thandi S. Schwarz, Omer S. Alkhnbashi, Kundan Sharma, Henning Urlaub, Rolf Backofen, Uri Gophna, Anita Marchfelder

In: RNA Biol, 2019, 16(4), 469-480

Invading genetic elements pose a constant threat to prokaryotic survival, requiring an effective defence. Eleven years ago, the arsenal of known defence mechanisms was expanded by the discovery of the CRISPR-Cas system. Although CRISPR-Cas is present in the majority of archaea, research often focuses on bacterial models. Here, we provide a perspective based on insights gained studying CRISPR-Cas system I-B of the archaeon Haloferax volcanii. The system relies on more than 50 different crRNAs, whose stability and maintenance critically depend on the proteins Cas5 and Cas7, which bind the crRNA and form the Cascade complex. The interference machinery requires a seed sequence and can interact with multiple PAM sequences. H. volcanii stands out as the first example of an organism that can tolerate autoimmunity via the CRISPR-Cas system while maintaining a constitutively active system. In addition, the H. volcanii system was successfully developed into a tool for gene regulation.

Available:
doi:10.1080/15476286.2018.1460994   pmid:29649958   BibTeX Entry ( Maier_Stachler_Brendel-The_nuts_and-2019 )

CRISPR-Cas systems in multicellular cyanobacteria

Shengwei Hou, Manuel Brenes-Alvarez, Viktoria Reimann, Omer S. Alkhnbashi, Rolf Backofen, Alicia M. Muro-Pastor, Wolfgang R. Hess

In: RNA Biol, 2019, 16(4), 518-529

Novel CRISPR-Cas systems possess substantial potential for genome editing and manipulation of gene expression. The types and numbers of CRISPR-Cas systems vary substantially between different organisms. Some filamentous cyanobacteria harbor > 40 different putative CRISPR repeat-spacer cassettes, while the number of cas gene instances is much lower. Here we addressed the types and diversity of CRISPR-Cas systems and of CRISPR-like repeat-spacer arrays in 171 publicly available genomes of multicellular cyanobacteria. The number of 1328 repeat-spacer arrays exceeded the total of 391 encoded Cas1 proteins suggesting a tendency for fragmentation or the involvement of alternative adaptation factors. The model cyanobacterium Anabaena sp. PCC 7120 contains only three cas1 genes but hosts three Class 1, possibly one Class 2 and five orphan repeat-spacer arrays, all of which exhibit crRNA-typical expression patterns suggesting active transcription, maturation and incorporation into CRISPR complexes. The CRISPR-Cas system within the element interrupting the Anabaena sp. PCC 7120 fdxN gene, as well as analogous arrangements in other strains, occupy the genetic elements that become excised during the differentiation-related programmed site-specific recombination. This fact indicates the propensity of these elements for the integration of CRISPR-cas systems and points to a previously not recognized connection. The gene all3613 resembling a possible Class 2 effector protein is linked to a short repeat-spacer array and a single tRNA gene, similar to its homologs in other cyanobacteria. The diversity and presence of numerous CRISPR-Cas systems in DNA elements that are programmed for homologous recombination make filamentous cyanobacteria a prolific resource for their study. Abbreviations: Cas: CRISPR associated sequences; CRISPR: Clustered Regularly Interspaced Short Palindromic Repeats; C2c: Class 2 candidate; SDR: small dispersed repeat; TSS: transcriptional start site; UTR: untranslated region.

Available:
pdf (2742 KB)   doi:10.1080/15476286.2018.1493330   pmid:29995583   BibTeX Entry ( Hou_Brenes-Alvarez_Reimann-CRISP_syste_multi-2019 )

Comprehensive search for accessory proteins encoded with archaeal and bacterial type III CRISPR-cas gene cassettes reveals 39 new cas gene families

Shiraz A. Shah, Omer S. Alkhnbashi, Juliane Behler, Wenyuan Han, Qunxin She, Wolfgang R. Hess, Roger A. Garrett, Rolf Backofen

In: RNA Biol, 2019, 16(4), 530-542

A study was undertaken to identify conserved proteins that are encoded adjacent to cas gene cassettes of Type III CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats - CRISPR associated) interference modules. Type III modules have been shown to target and degrade dsDNA, ssDNA and ssRNA and are frequently intertwined with cofunctional accessory genes, including genes encoding CRISPR-associated Rossman Fold (CARF) domains. Using a comparative genomics approach, and defining a Type III association score accounting for coevolution and specificity of flanking genes, we identified and classified 39 new Type III associated gene families. Most archaeal and bacterial Type III modules were seen to be flanked by several accessory genes, around half of which did not encode CARF domains and remain of unknown function. Northern blotting and interference assays in Synechocystis confirmed that one particular non-CARF accessory protein family was involved in crRNA maturation. Non-CARF accessory genes were generally diverse, encoding nuclease, helicase, protease, ATPase, transporter and transmembrane domains with some encoding no known domains. We infer that additional families of non-CARF accessory proteins remain to be found. The method employed is scalable for potential application to metagenomic data once automated pipelines for annotation of CRISPR-Cas systems have been developed. All accessory genes found in this study are presented online in a readily accessible and searchable format for researchers to audit their model organism of choice: http://accessory.crispr.dk .

Available:
pdf (629 KB)   doi:10.1080/15476286.2018.1483685   pmid:29911924   BibTeX Entry ( Shah_Alkhnbashi_Behler-Compr_searc_for-2019 )

Constraint Maximal Inter-molecular Helix Lengths within RNA-RNA Interaction Prediction Improves Bacterial sRNA Target Prediction

Rick Gelhausen, Sebastian Will, Ivo L. Hofacker, Rolf Backofen, Martin Raden

2019

Efficient computational tools for the identification of putative target RNAs regulated by prokaryotic sRNAs rely on thermodynamic models of RNA secondary structures. While they typically predict RNA-RNA in- teraction complexes accurately, they yield many highly-ranked false positives in target screens. One obvious source of this low specificity appears to be the disability of current secondary-structure-based models to reflect steric constraints, which nevertheless govern the kinetic formation of RNA-RNA interactions. For example, often?even thermodynamically favorable?extensions of short initial kissing hairpin interactions are kineti- cally prohibited, since this would require unwinding of intra-molecular helices as well as sterically impossible bending of the interaction helix. In consequence, the efficient prediction methods, which do not consider such effects, predict over-long helices. To increase the prediction accuracy, we devise a dynamic programming algorithm that length-restricts the runs of consecutive inter-molecular base pairs (perfect canonical stackings), which we hypothesize to implicitely model the steric and kinetic effects. The novel method is implemented by extending the state-of-the-art tool IntaRNA. Our comprehensive bacterial sRNA target prediction benchmark demonstrates significant improvements of the prediction accuracy and enables 3-4 times faster computations. These results indicate?supporting our hypothesis?that length-limitations on inter-molecular subhelices in- crease the accuracy of interaction prediction models compared to the current state-of-the-art approach.

Available:
pdf (371 KB)   doi:10.5220/0007689701310140   BibTeX Entry ( Gelhausen-helixLength-2019 )

Integration of accessibility data from structure probing into RNA-RNA interaction prediction

Milad Miladi, Soheila Montaseri, Rolf Backofen, Martin Raden

In: Bioinformatics, 2019, 35(16), 2862-2864

Experimental structure probing data has been shown to improve thermodynamics-based RNA secondary structure prediction. To this end, chemical reactivity information (as provided e.g. by SHAPE) is incorporated, which encodes whether or not individual nucleotides are involved in intra-molecular structure. Since inter-molecular RNA-RNA interactions are often confined to unpaired RNA regions, SHAPE data is even more promising to improve interaction prediction. Here we show how such experimental data can be incorporated seamlessly into accessibility-based RNA-RNA interaction prediction approaches, as implemented in IntaRNA. This is possible via the computation and use of unpaired probabilities that incorporate the structure probing information. We show that experimental SHAPE data can significantly improve RNA-RNA interaction prediction. We evaluate our approach by investigating interactions of a spliceosomal U1 snRNA transcript with its target splice sites. When SHAPE data is incorporated, known target sites are predicted with increased precision and specificity. Keywords: RNA-RNA interaction prediction, accessibility, RNA structure probing, RNA secondary structure, chemical footprinting, SHAPE. Availability: https://github.com/BackofenLab/IntaRNA

Available:
supplement.pdf (3548 KB)   pdf (268 KB)   doi:10.1093/bioinformatics/bty1029   BibTeX Entry ( Miladi-2018-shape )

uORF-Tools - Workflow for the determination of translation-regulatory upstream open reading frames

Anica Scholz, Florian Eggenhofer, Rick Gelhausen, Björn Grüning, Kathi Zarnack, Bernhard Brüne, Rolf Backofen, Tobias Schmid

In: PLOS ONE, 09 2019, 14, e0222459

Ribosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5' untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools uses Ribo-TISH to identify uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.

Available:
pdf (1132 KB)   doi:10.1371/journal.pone.0222459   BibTeX Entry ( uORF-Tools )

DOT1L promotes progenitor proliferation and primes neuronal layer identity in the developing cerebral cortex

Henriette Franz, Alejandro Villarreal, Stefanie Heidrich, Pavankumar Videm, Fabian Kilpert, Ivan Mestres, Federico Calegari, Rolf Backofen, Thomas Manke, Tanja Vogel

In: Nucleic Acids Research, 10 2018, 47(1), 168-183

Cortical development is controlled by transcriptional programs, which are orchestrated by transcription factors. Yet, stable inheritance of spatio-temporal activity of factors influencing cell fate and localization in different layers is only partly understood. Here we find that deletion of Dot1l in the murine telencephalon leads to cortical layering defects, indicating DOT1L activity and chromatin methylation at H3K79 impact on the cell cycle, and influence transcriptional programs conferring upper layer identity in early progenitors. Specifically, DOT1L prevents premature differentiation by increasing expression of genes that regulate asymmetric cell division (Vangl2, Cenpj). Loss of DOT1L results in reduced numbers of progenitors expressing genes including SoxB1 gene family members. Loss of DOT1L also leads to altered cortical distribution of deep layer neurons that express either TBR1, CTIP2 or SOX5, and less activation of transcriptional programs that are characteristic for upper layer neurons (Satb2, Pou3f3, Cux2, SoxC family members). Data from three different mouse models suggest that DOT1L balances transcriptional programs necessary for proper neuronal composition and distribution in the six cortical layers. Furthermore, because loss of DOT1L in the pre-neurogenic phase of development impairs specifically generation of SATB2-expressing upper layer neurons, our data suggest that DOT1L primes upper layer identity in cortical progenitors.

Available:
pdf (26148 KB)   doi:10.1093/nar/gky953   BibTeX Entry ( franz_villarreal-dot1l_progenitor-2018 )

FOXG1 Regulates PRKAR2B Transcriptionally and Posttranscriptionally via miR200 in the Adult Hippocampus

Stefan C. Weise, Ganeshkumar Arumugam, Alejandro Villarreal, Pavankumar Videm, Stefanie Heidrich, Nils Nebel, Verónica I. Dumit, Farahnaz Sananbenesi, Viktoria Reimann, Madeline Craske, Oliver Schilling, Wolfgang R. Hess, Andre Fischer, Rolf Backofen, Tanja Vogel

In: Molecular Neurobiology, Dec 2018

Rett syndrome is a complex neurodevelopmental disorder that is mainly caused by mutations in MECP2. However, mutations in FOXG1 cause a less frequent form of atypical Rett syndrome, called FOXG1 syndrome. FOXG1 is a key transcription factor crucial for forebrain development, where it maintains the balance between progenitor proliferation and neuronal differentiation. Using genome-wide small RNA sequencing and quantitative proteomics, we identified that FOXG1 affects the biogenesis of miR200b/a/429 and interacts with the ATP-dependent RNA helicase, DDX5/p68. Both FOXG1 and DDX5 associate with the microprocessor complex, whereby DDX5 recruits FOXG1 to DROSHA. RNA-Seq analyses of Foxg1cre/+ hippocampi and N2a cells overexpressing miR200 family members identified cAMP-dependent protein kinase type II-beta regulatory subunit (PRKAR2B) as a target of miR200 in neural cells. PRKAR2B inhibits postsynaptic functions by attenuating protein kinase A (PKA) activity; thus, increased PRKAR2B levels may contribute to neuronal dysfunctions in FOXG1 syndrome. Our data suggest that FOXG1 regulates PRKAR2B expression both on transcriptional and posttranscriptional levels.

Available:
pdf (3369 KB)   doi:10.1007/s12035-018-1444-7   BibTeX Entry ( Weise2018 )

On filter size in graph convolutional networks

Dinh V Tran, Nicolò Navarin, Alessandro Sperduti

In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, 1534--1541

Recently, many researchers have been focusing on the definition of neural networks for graphs. The basic component for many of these approaches remains the graph convolution idea proposed almost a decade ago. In this paper, we extend this basic component, following an intuition derived from the well-known convolutional filters over multi-dimensional tensors. In particular, we derive a simple, efficient and effective way to introduce a hyper-parameter on graph convolutions that influences the filter size, i.e., its receptive field over the considered graph. We show with experimental results on real-world graph datasets that the proposed graph convolutional filter improves the predictive performance of Deep Graph Convolutional Networks.

Available:
pdf (1573 KB)   BibTeX Entry ( Tran-2018-filter )

GraphDDP: a graph-embedding approach to detect differentiation pathways in single-cell-data using prior class knowledge

Fabrizio Costa, Dominic Grun, Rolf Backofen

In: Nat Commun, 2018, 9(1), 3685

Cell types can be characterized by expression profiles derived from single-cell RNA-seq. Subpopulations are identified via clustering, yielding intuitive outcomes that can be validated by marker genes. Clustering, however, implies a discretization that cannot capture the continuous nature of differentiation processes. One could give up the detection of subpopulations and directly estimate the differentiation process from cell profiles. A combination of both types of information, however, is preferable. Crucially, clusters can serve as anchor points of differentiation trajectories. Here we present GraphDDP, which integrates both viewpoints in an intuitive visualization. GraphDDP starts from a user-defined cluster assignment and then uses a force-based graph layout approach on two types of carefully constructed edges: one emphasizing cluster membership, the other, based on density gradients, emphasizing differentiation trajectories. We show on intestinal epithelial cells and myeloid progenitor data that GraphDDP allows the identification of differentiation pathways that cannot be easily detected by other approaches.

Available:
pdf (1696 KB)   doi:10.1038/s41467-018-05988-7   pmid:30206223   BibTeX Entry ( Costa_Grun_Backofen-Graph_graph_appro-2018 )

Analysis of the androgen receptor-regulated lncRNA landscape identifies a role for ARLNC1 in prostate cancer progression

Yajia Zhang, Sethuramasundaram Pitchiaya, Marcin Cieslik, Yashar S. Niknafs, Jean C-Y Tien, Yasuyuki Hosono, Matthew K. Iyer, Sahr Yazdani, Shruthi Subramaniam, Sudhanshu K. Shukla, Xia Jiang, Lisha Wang, Tzu-Ying Liu, Michael Uhl, Alexander R. Gawronski, Yuanyuan Qiao, Lanbo Xiao, Saravana M. Dhanasekaran, Kristin M. Juckette, Lakshmi P. Kunju, Xuhong Cao, Utsav Patel, Mona Batish, Girish C. Shukla, Michelle T. Paulsen, Mats Ljungman, Hui Jiang, Rohit Mehra, Rolf Backofen, Cenk S. Sahinalp, Susan M. Freier, Andrew T. Watt, Shuling Guo, John T. Wei, Felix Y. Feng, Rohit Malik, Arul M. Chinnaiyan

In: Nat Genet, 2018, 50(6), 814-824

The androgen receptor (AR) plays a critical role in the development of the normal prostate as well as prostate cancer. Using an integrative transcriptomic analysis of prostate cancer cell lines and tissues, we identified ARLNC1 (AR-regulated long noncoding RNA 1) as an important long noncoding RNA that is strongly associated with AR signaling in prostate cancer progression. Not only was ARLNC1 induced by the AR protein, but ARLNC1 stabilized the AR transcript via RNA-RNA interaction. ARLNC1 knockdown suppressed AR expression, global AR signaling and prostate cancer growth in vitro and in vivo. Taken together, these data support a role for ARLNC1 in maintaining a positive feedback loop that potentiates AR signaling during prostate cancer progression and identify ARLNC1 as a novel therapeutic target.

Available:
pdf (3275 KB)   doi:10.1038/s41588-018-0120-1   pmid:29808028   BibTeX Entry ( Zhang_Pitchiaya_Cieslik-Analy_the_andro-2018 )

Bioconda: sustainable and comprehensive software distribution for the life sciences

Björn Grüning, Andreas Dale, Ryan Sjödin, Brad A. Chapman, Jillian Rowe, Christopher H. Tomkins-Tinch, Renan Valieris, Bioconda Team, Johannes Köster

In: Nature Methods, 2018, 15(7), 475-476

We present Bioconda (https://bioconda.github.io), a distribution of bioinformatics software for the lightweight, multi-platform and language-agnostic package manager, Conda. Currently, Bioconda offers a collection of over 2900 software tools, which are continuously maintained, updated, and extended by a growing global community of more than 200 contributors. Bioconda improves analysis reproducibility by allowing users to define isolated environments with defined software versions, all of which are easily installed and managed without administrative privileges.

Publication note:
preprint at bioRxiv https://doi.org/10.1101/207092

Available:
doi:10.1038/s41592-018-0046-7   BibTeX Entry ( Gruening-2018-bioconda )
Links:
https://doi.org/10.1101/207092

Interactive implementations of thermodynamics-based RNA structure and RNA-RNA interaction prediction approaches for example-driven teaching

Martin Raden, Mostafa M Mohamed, Syed M Ali, Rolf Backofen

In: PLOS Comput. Biol, 2018, 14(8), e1006341

The investigation of RNA-based regulation of cellular processes is becoming an increasingly important part of biological or medical research. For the analysis of this type of data, RNA-related prediction tools integrated into of many pipelines and workflows. In order to correctly apply and tune these programs, the user has to have a precise understanding of their limitations and concepts. Within this manuscript, we provide the mathematical foundations and extract the algorithmic ideas that are core to state-of-the-art RNA structure and RNA-RNA interaction prediction algorithms. To allow the reader to change and adapt the algorithms or to play with different inputs, we provide an open-source web interface to JavaScript implementations and visualizations of each algorithm. The conceptual, teaching-focused presentation enables a high-level survey of the approaches while providing sufficient details for understanding important concepts. This is boosted by the simple generation and study of examples using the web interface available under http://rna.informatik.uni-freiburg.de/Teaching/. In combination, we provide a valuable resource for teaching, learning and understanding the discussed prediction tools and thus enable a more informed analysis of RNA-related effects.

Available:
pdf (1032 KB)   doi:10.1371/journal.pcbi.1006341   BibTeX Entry ( Raden-2018-teaching )

Freiburg RNA tools: a central online resource for RNA-focused research and teaching

Martin Raden, Syed M Ali, Omer S Alkhnbashi, Anke Busch, Fabrizio Costa, Jason A Davis, Florian Eggenhofer, Rick Gelhausen, Jens Georg, Steffen Heyne, Michael Hiller, Kousik Kundu, Robert Kleinkauf, Steffen C Lott, Mostafa M Mohamed, Alexander Mattheis, Milad Miladi, Andreas S Richter, Sebastian Will, Joachim Wolff, Patrick R Wright, Rolf Backofen

In: Nucleic Acids Research, 2018, 46(W1), W25-W29

The Freiburg RNA tools webserver is a well established online resource for RNA-focused research. It provides a unified user interface and comprehensive result visualization for efficient command line tools. The webserver includes RNA-RNA interaction prediction (IntaRNA, CopraRNA, metaMIR), sRNA homology search (GLASSgo), sequence-structure alignments (LocARNA, MARNA, CARNA, ExpaRNA), CRISPR repeat classification (CRISPRmap), sequence design (antaRNA, INFO-RNA, SECISDesign), structure aberration evaluation of point mutations (RaSE), and RNA/protein-family models visualization (CMV), and other methods. Open education resources offer interactive visualizations of RNA structure and RNA-RNA interaction prediction as well as basic and advanced sequence alignment algorithms. The services are freely available at http://rna.informatik.uni-freiburg.de.

Available:
supplement.pdf (2488 KB)   pdf (108 KB)   doi:10.1093/nar/gky329   BibTeX Entry ( Raden-2018-websrv )

Combinatorial Omics Analysis Reveals Perturbed Lysosomal Homeostasis in Collagen VII-deficient Keratinocytes

Kerstin Thriene, Bjorn Andreas Gruning, Olivier Bornert, Anika Erxleben, Juna Leppert, Ioannis Athanasiou, Ekkehard Weber, Dimitra Kiritsi, Alexander Nystrom, Thomas Reinheckel, Rolf Backofen, Cristina Has, Leena Bruckner-Tuderman, Jorn Dengjel

In: Mol Cell Proteomics, 2018, 17(4), 565-579

The extracellular matrix protein collagen VII is part of the microenvironment of stratified epithelia and critical in organismal homeostasis. Mutations in the encoding gene COL7A1 lead to the skin disorder dystrophic epidermolysis bullosa (DEB), are linked to skin fragility and progressive inflammation-driven fibrosis that facilitates aggressive skin cancer. So far, these changes have been linked to mesenchymal alterations, the epithelial consequences of collagen VII loss remaining under-addressed. As epithelial dysfunction is a principal initiator of fibrosis, we performed a comprehensive transcriptome and proteome profiling of primary human keratinocytes from DEB and control subjects to generate global and detailed images of dysregulated epidermal molecular pathways linked to loss of collagen VII. These revealed downregulation of interaction partners of collagen VII on mRNA and protein level, but also increased abundance of S100 pro-inflammatory proteins in primary DEB keratinocytes. Increased TGF-beta signaling because of loss of collagen VII was associated with enhanced activity of lysosomal proteases in both keratinocytes and skin of collagen VII-deficient individuals. Thus, loss of a single structural protein, collagen VII, has extra- and intracellular consequences, resulting in inflammatory processes that enable tissue destabilization and promote keratinocyte-driven, progressive fibrosis.

Available:
pdf (3478 KB)   doi:10.1074/mcp.RA117.000437   pmid:29326176   BibTeX Entry ( Thriene_Gruning_Bornert-Combi_Omics_Analy-2018 )

Distinct epigenetic programs regulate cardiac myocyte development and disease in the human heart in vivo

Ralf Gilsbach, Martin Schwaderer, Sebastian Preissl, Bjorn A. Gruning, David Kranzhofer, Pedro Schneider, Thomas G. Nuhrenberg, Sonia Mulero-Navarro, Dieter Weichenhan, Christian Braun, Martina Dressen, Adam R. Jacobs, Harald Lahm, Torsten Doenst, Rolf Backofen, Markus Krane, Bruce D. Gelb, Lutz Hein

In: Nat Commun, 2018, 9(1), 391

Epigenetic mechanisms and transcription factor networks essential for differentiation of cardiac myocytes have been uncovered. However, reshaping of the epigenome of these terminally differentiated cells during fetal development, postnatal maturation, and in disease remains unknown. Here, we investigate the dynamics of the cardiac myocyte epigenome during development and in chronic heart failure. We find that prenatal development and postnatal maturation are characterized by a cooperation of active CpG methylation and histone marks at cis-regulatory and genic regions to shape the cardiac myocyte transcriptome. In contrast, pathological gene expression in terminal heart failure is accompanied by changes in active histone marks without major alterations in CpG methylation and repressive chromatin marks. Notably, cis-regulatory regions in cardiac myocytes are significantly enriched for cardiovascular disease-associated variants. This study uncovers distinct layers of epigenetic regulation not only during prenatal development and postnatal maturation but also in diseased human cardiac myocytes.

Available:
pdf (7419 KB)   doi:10.1038/s41467-017-02762-z   pmid:29374152   BibTeX Entry ( Gilsbach_Schwaderer_Preissl-Disti_epige_progr-2018 )

hnRNP R and its main interactor, the noncoding RNA 7SK, coregulate the axonal transcriptome of motoneurons

Michael Briese, Lena Saal-Bauernschubert, Changhe Ji, Mehri Moradi, Hanaa Ghanawi, Michael Uhl, Silke Appenzeller, Rolf Backofen, Michael Sendtner

In: Proc. Natl. Acad. Sci. USA, 2018, 115(12), E2859-E2868

Disturbed RNA processing and subcellular transport contribute to the pathomechanisms of motoneuron diseases such as amyotrophic lateral sclerosis and spinal muscular atrophy. RNA-binding proteins are involved in these processes, but the mechanisms by which they regulate the subcellular diversity of transcriptomes, particularly in axons, are not understood. Heterogeneous nuclear ribonucleoprotein R (hnRNP R) interacts with several proteins involved in motoneuron diseases. It is located in axons of developing motoneurons, and its depletion causes defects in axon growth. Here, we used individual nucleotide-resolution cross-linking and immunoprecipitation (iCLIP) to determine the RNA interactome of hnRNP R in motoneurons. We identified approximately 3,500 RNA targets, predominantly with functions in synaptic transmission and axon guidance. Among the RNA targets identified by iCLIP, the noncoding RNA 7SK was the top interactor of hnRNP R. We detected 7SK in the nucleus and also in the cytosol of motoneurons. In axons, 7SK localized in close proximity to hnRNP R, and depletion of hnRNP R reduced axonal 7SK. Furthermore, suppression of 7SK led to defective axon growth that was accompanied by axonal transcriptome alterations similar to those caused by hnRNP R depletion. Using a series of 7SK-deletion mutants, we show that the function of 7SK in axon elongation depends on its interaction with hnRNP R but not with the PTEF-B complex involved in transcriptional regulation. These results propose a role for 7SK as an essential interactor of hnRNP R to regulate its function in axon maintenance.

Available:
pdf (2095 KB)   doi:10.1073/pnas.1721670115   pmid:29507242   BibTeX Entry ( Briese_Saal-Bauernschubert_Ji-hnRNP_and_its-PNAS2018 )

CMV - Visualization for RNA and Protein family models and their comparisons

Florian Eggenhofer, Ivo L. Hofacker, Rolf Backofen, Christian Honer Zu Siederdissen

In: Bioinformatics, 2018

Summary: A standard method for the identification of novel RNAs or proteins is homology search via probabilistic models. One approach relies on the definition of families, which can be encoded as covariance models (CMs) or Hidden Markov Models (HMMs). While being powerful tools, their complexity makes it tedious to investigate them in their (default) tabulated form. This specifically applies to the interpretation of comparisons between multiple models as in family clans. The Covariance model visualization tools (CMV) visualize CMs or HMMs to: I) Obtain an easily interpretable representation of HMMs and CMs; II) Put them in context with the structural sequence alignments they have been created from; III) Investigate results of model comparisons and highlight regions of interest. Availability: Source code (http://www.github.com/eggzilla/cmv), web-service (http://rna.informatik.uni-freiburg.de/CMVS). Contact: egg@informatik.uni-freiburg.de, choener@bioinf.uni-leipzig.de. Supplementary information: Supplementary data available online.

Available:
pdf (266 KB)   doi:10.1093/bioinformatics/bty158   pmid:29554223   BibTeX Entry ( Eggenhofer_Hofacker_Backofen-CMV_Visua_for-2018 )

The nuts and bolts of the Haloferax CRISPR-Cas system I-B

Lisa-Katharina Maier, Aris-Edda Stachler, Jutta Brendel, Britta Stoll, Susan Fischer, Karina Haas, Thandi Schwarz, Omer S. Alkhnbashi, Kundan Sharma, Henning Urlaub, Rolf Backofen, Uri Gophna, Anita Marchfelder

In: RNA Biol, 2018, 1-39

Invading genetic elements pose a constant threat to prokaryotic survival, requiring an effective defence. Eleven years ago, the arsenal of known defence mechanisms was expanded by the discovery of the CRISPR-Cas system. Although CRISPR-Cas is present in the majority of archaea, research often focuses on bacterial models. Here, we provide a perspective based on insights gained studying CRISPR-Cas system I-B of the archaeon Haloferax volcanii. The system relies on more than 50 different crRNAs, whose stability and maintenance critically depend on the proteins Cas5 and Cas7, which bind the crRNA and form the Cascade complex. The interference activity requires a seed sequence and can interact with multiple PAM sequences. H. volcanii stands out as the first example of an organism that can tolerate autoimmunity via the CRISPR-Cas system while maintaining a constitutively active system. In addition, the H. volcanii system was successfully developed into a tool for gene regulation.

Available:
doi:10.1080/15476286.2018.1460994   pmid:29649958   BibTeX Entry ( Maier_Stachler_Brendel-The_nuts_and-2018 )

Comparative RNA Genomics

Rolf Backofen, Jan Gorodkin, Ivo L. Hofacker, Peter F. Stadler

In: Methods Mol Biol, 2018, 1704, 363-400

Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.

Available:
doi:10.1007/978-1-4939-7463-4_14   pmid:29277874   BibTeX Entry ( Backofen_Gorodkin_Hofacker-Compa_RNA_Genom-2018 )

In vitro iCLIP-based modeling uncovers how the splicing factor U2AF2 relies on regulation by cofactors

F. X. Reymond Sutandy, Stefanie Ebersberger, Lu Huang, Anke Busch, Maximilian Bach, Hyun-Seo Kang, Jorg Fallmann, Daniel Maticzka, Rolf Backofen, Peter F. Stadler, Kathi Zarnack, Michael Sattler, Stefan Legewie, Julian Konig

In: Genome Res, 2018

Alternative splicing generates distinct mRNA isoforms and is crucial for proteome diversity in eukaryotes. The RNA-binding protein (RBP) U2AF2 is central to splicing decisions, as it recognizes 3' splice sites and recruits the spliceosome. We establish ïn vitro iCLIPëxperiments, in which recombinant RBPs are incubated with long transcripts, to study how U2AF2 recognizes RNA sequences and how this is modulated by trans-acting RBPs. We measure U2AF2 affinities at hundreds of binding sites and compare in vitro and in vivo binding landscapes by mathematical modeling. We find that trans-acting RBPs extensively regulate U2AF2 binding in vivo, including enhanced recruitment to 3' splice sites and clearance of introns. Using machine learning, we identify and experimentally validate novel trans-acting RBPs (including FUBP1, CELF6, and PCBP1) that modulate U2AF2 binding and affect splicing outcomes. Our study offers a blueprint for the high-throughput characterization of in vitro mRNP assembly and in vivo splicing regulation.

Available:
doi:10.1101/gr.229757.117   pmid:29643205   BibTeX Entry ( Sutandy_Ebersberger_Huang-vitro_iCLIP_model-2018 )

MechRNA: prediction of lncRNA mechanisms from RNA-RNA and RNA-protein interactions

Alexander R. Gawronski, Michael Uhl, Yajia Zhang, Yen-Yi Lin, Yashar S. Niknafs, Varune R. Ramnarine, Rohit Malik, Felix Feng, Arul M. Chinnaiyan, Colin C. Collins, S. Cenk Sahinalp, Rolf Backofen

In: Bioinformatics, 2018

Motivation: Long non-coding RNAs (lncRNAs) are defined as transcripts longer than 200 nucleotides that do not get translated into proteins. Often these transcripts are processed (spliced, capped, polyadenylated) and some are known to have important biological functions. However, most lncRNAs have unknown or poorly understood functions. Nevertheless, because of their potential role in cancer, lncRNAs are receiving a lot of attention, and the need for computational tools to predict their possible mechanisms of action is more than ever. Fundamentally, most of the known lncRNA mechanisms involve RNA-RNA and/or RNA-protein interactions. Through accurate predictions of each kind of interaction and integration of these predictions, it is possible to elucidate potential mechanisms for a given lncRNA. Approach: Here we introduce MechRNA, a pipeline for corroborating RNA-RNA interaction prediction and protein binding prediction for identifying possible lncRNA mechanisms involving specific targets or on a transcriptome-wide scale. The first stage uses a version of IntaRNA2 with added functionality for efficient prediction of RNA-RNA interactions with very long input sequences, allowing for large-scale analysis of lncRNA interactions with little or no loss of optimality. The second stage integrates protein binding information pre-computed by GraphProt, for both the lncRNA and the target. The final stage involves inferring the most likely mechanism for each lncRNA/target pair. This is achieved by generating candidate mechanisms from the predicted interactions, the relative locations of these interactions and correlation data, followed by selection of the most likely mechanistic explanation using a combined p-value. Results: We applied MechRNA on a number of recently identified cancer-related lncRNAs (PCAT1, PCAT29, ARLnc1) and also on two well-studied lncRNAs (PCA3 and 7SL). This led to the identification of hundreds of high confidence potential targets for each lncRNA and corresponding mechanisms. These predictions include the known competitive mechanism of 7SL with HuR for binding on the tumor suppressor TP53, as well as mechanisms expanding what is known about PCAT1 and ARLn1 and their targets BRCA2 and AR, respectively. For PCAT1-BRCA2, the mechanism involves competitive binding with HuR, which we confirmed using HuR immunoprecipitation assays. Availability: MechRNA is available for download at https://bitbucket.org/compbio/mechrna. Contact: backofen@informatik.uni-freiburg.de, cenksahi@indiana.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Available:
doi:10.1093/bioinformatics/bty208   pmid:29617966   BibTeX Entry ( Gawronski_Uhl_Zhang-MechR_predi_lncRN-2018 )

GLASSgo - Automated and reliable detection of sRNA homologs from a single input sequences

Steffen C. Lott, Richard A Schäfer, Martin Mann, Rolf Backofen, Wolfgang R Hess, Björn Voss, Jens Georg

In: Frontiers in Genetics, 2018, 9, 124

Bacterial small RNAs (sRNAs) are important post-transcriptional regulators of gene expression. The functional and evolutionary characterization of sRNAs requires the identification of homologs, which is frequently challenging due to their heterogeneity, short length and partly, little sequence conservation. We developed the GLobal Automatic Small RNA Search go (GLASSgo) algorithm to identify sRNA homologs in complex genomic databases starting from a single sequence. GLASSgo combines an iterative BLAST strategy with pairwise identity filtering and a graph-based clustering method that utilizes RNA secondary structure information. We tested the specificity, sensitivity and runtime of GLASSgo, BLAST and the combination RNAlien/cmsearch in a typical use case scenario on 40 bacterial sRNA families. The sensitivity of the tested methods was similar, while the specificity of GLASSgo and RNAlien/cmsearch was significantly higher than that of BLAST. GLASSgo was on average about 87 times faster than RNAlien/cmsearch, and only about 7.5 times slower than BLAST, which shows that GLASSgo optimizes the trade-off between speed and accuracy in the task of finding sRNA homologs. GLASSgo is fully automated, whereas BLAST often recovers only parts of homologs and RNAlien/cmsearch requires extensive additional bioinformatic work to get a comprehensive set of homologs. GLASSgo is available as an easy-to-use web server to find homologous sRNAs in large databases.

Available:
supplement.zip (11263 KB)   pdf (3142 KB)   doi:10.3389/fgene.2018.00124   BibTeX Entry ( Georg:2018 )

Structure and interaction prediction in prokaryotic RNA biology

Patrick R. Wright, Martin Mann, Rolf Backofen

In: Microbiol Spectrum, 2018, 6(2)

Many years of research in RNA biology have soundly established the importance of RNA based regulation far beyond most early traditional presumptions. Importantly, the advances in "wet" laboratory techniques have produced unprecedented amounts of data that require efficient and precise computational analysis schemes and algorithms. Hence, many in silico methods that attempt topological and functional classification of novel putative RNA based regulators are available. In this review we technically outline thermodynamics-based standard RNA secondary structure and RNA-RNA interaction prediction approaches that have proven valuable to the RNA research community in the past and present. For these, we highlight their usability with a special focus on prokaryotic organisms and also briefly mention recent advances in whole genome interactomics and how this may influence the field of predictive RNA research.

Available:
supplement.pdf (366 KB)   pdf (719 KB)   doi:10.1128/microbiolspec.RWR-0001-2017   BibTeX Entry ( Wright-2018 )

Workflow for a Computational Analysis of an sRNA Candidate in Bacteria

Patrick R. Wright, Jens Georg

In: Methods Mol Biol, 2018, 1737, 3-30

Computational methods can often facilitate the functional characterization of individual sRNAs and furthermore allow high-throughput analysis on large numbers of sRNA candidates. This chapter outlines a potential workflow for computational sRNA analyses and describes in detail methods for homolog detection, target prediction, and functional characterization based on enrichment analysis. The cyanobacterial sRNA IsaR1 is used as a specific example. All methods are available as webservers and easily accessible for nonexpert users.

Available:
doi:10.1007/978-1-4939-7634-8_1   pmid:29484584   BibTeX Entry ( Wright_Georg-Workf_for_Compu-2018 )

uvCLAP is a fast and non-radioactive method to identify in vivo targets of RNA-binding proteins

Daniel Maticzka, Ibrahim Avsar Ilik, Tugce Aktas, Rolf Backofen, Asifa Akhtar

In: Nat Commun, 2018, 9(1), 1142

RNA-binding proteins (RBPs) play important and essential roles in eukaryotic gene expression regulating splicing, localization, translation, and stability of mRNAs. We describe ultraviolet crosslinking and affinity purification (uvCLAP), an easy-to-use, robust, reproducible, and high-throughput method to determine in vivo targets of RBPs. uvCLAP is fast and does not rely on radioactive labeling of RNA. We investigate binding of 15 RBPs from fly, mouse, and human cells to test the method's performance and applicability. Multiplexing of signal and control libraries enables straightforward comparison of samples. Experiments for most proteins achieve high enrichment of signal over background. A point mutation and a natural splice isoform that change the RBP subcellular localization dramatically alter target selection without changing the targeted RNA motif, showing that compartmentalization of RBPs can be used as an elegant means to generate RNA target specificity.

Available:
pdf (1599 KB)   doi:10.1038/s41467-018-03575-4   pmid:29559621   BibTeX Entry ( Maticzka_Ilik_Aktas-uvCLA_fast_and-2018 )

The RNA-binding protein ARPP21 controls dendritic branching by functionally opposing the miRNA it hosts

Frederick Rehfeld, Daniel Maticzka, Sabine Grosser, Pina Knauff, Murat Eravci, Imre Vida, Rolf Backofen, F. Gregory Wulczyn

In: Nat Commun, 2018, 9(1), 1235

About half of mammalian miRNA genes lie within introns of protein-coding genes, yet little is known about functional interactions between miRNAs and their host genes. The intronic miRNA miR-128 regulates neuronal excitability and dendritic morphology of principal neurons during mouse cerebral cortex development. Its conserved host genes, R3hdm1 and Arpp21, are predicted RNA-binding proteins. Here we use iCLIP to characterize ARPP21 recognition of uridine-rich sequences with high specificity for 3'UTRs. ARPP21 antagonizes miR-128 activity by co-regulating a subset of miR-128 target mRNAs enriched for neurodevelopmental functions. Protein-protein interaction data and functional assays suggest that ARPP21 acts as a positive post-transcriptional regulator by interacting with the translation initiation complex eIF4F. This molecular antagonism is reflected in inverse activities during dendritogenesis: miR-128 overexpression or knockdown of ARPP21 reduces dendritic complexity; ectopic ARPP21 leads to an increase. Thus, we describe a unique example of convergent function by two products of a single gene.

Available:
pdf (1775 KB)   doi:10.1038/s41467-018-03681-3   pmid:29581509   BibTeX Entry ( Rehfeld_Maticzka_Grosser-The_RNA_prote-2018 )

MICA: Multiple interval-based curve alignment

Martin Mann, Hans-Peter Kahle, Matthias Beck, Bela Johannes Bender, Heinrich Spiecker, Rolf Backofen

In: SoftwareX, 2018, 7, 53-58

Abstract \{MICA\} enables the automatic synchronization of discrete data curves. To this end, characteristic points of the curves’ shapes are identified. These landmarks are used within a heuristic curve registration approach to align profile pairs by mapping similar characteristics onto each other. In combination with a progressive alignment scheme, this enables the computation of multiple curve alignments. Multiple curve alignments are needed to derive meaningful representative consensus data of measured time or data series. \{MICA\} was already successfully applied to generate representative profiles of tree growth data based on intra-annual wood density profiles or cell formation data. The \{MICA\} package provides a command-line and graphical user interface. The R interface enables the direct embedding of multiple curve alignment computation into larger analyses pipelines. Source code, binaries and documentation are freely available at https://github.com/BackofenLab/MICA

Available:
pdf (429 KB)   doi:10.1016/j.softx.2018.02.003   BibTeX Entry ( Mann-MICA-2018 )

A mutually exclusive stem-loop arrangement in roX2 RNA is essential for X-chromosome regulation in Drosophila

Ibrahim Avsar Ilik, Daniel Maticzka, Plamen Georgiev, Noel Marie Gutierrez, Rolf Backofen, Asifa Akhtar

In: Genes Dev, 2017, 31(19), 1973-1987

The X chromosome provides an ideal model system to study the contribution of RNA-protein interactions in epigenetic regulation. In male flies, roX long noncoding RNAs (lncRNAs) harbor several redundant domains to interact with the ubiquitin ligase male-specific lethal 2 (MSL2) and the RNA helicase Maleless (MLE) for X-chromosomal regulation. However, how these interactions provide the mechanics of spreading remains unknown. By using the uvCLAP (UV cross-linking and affinity purification) methodology, which provides unprecedented information about RNA secondary structures in vivo, we identified the minimal functional unit of roX2 RNA. By using wild-type and various MLE mutant derivatives, including a catalytically inactive MLE derivative, MLE(GET), we show that the minimal roX RNA contains two mutually exclusive stem-loops that exist in a peculiar structural arrangement: When one stem-loop is unwound by MLE, an alternate structure can form, likely trapping MLE in this perpetually structured region. We show that this functional unit is necessary for dosage compensation, as mutations that disrupt this formation lead to male lethality. Thus, we propose that roX2 lncRNA contains an MLE-dependent affinity switch to enable reversible interactions of the MSL complex to allow dosage compensation of the X chromosome.

Available:
pdf (11543 KB)   doi:10.1101/gad.304600.117   pmid:29066499   BibTeX Entry ( Ilik_Maticzka_Georgiev-mutua_exclu_stem-2017 )

Combinatorial ensemble miRNA target prediction of co-regulation networks with non-prediction data

Jason A. Davis, Sita J. Saunders, Martin Mann, Rolf Backofen

In: Nucleic Acids Research, 2017, 45(15), 8745-8757

MicroRNAs (miRNAs) are key regulators of cell-fate decisions in development and disease with a vast array of target interactions that can be investigated using computational approaches. For this study, we developed metaMIR, a combinatorial approach to identify miRNAs that co-regulate identified subsets of genes from a user-supplied list. We based metaMIR predictions on an improved dataset of human miRNA-target interactions, compiled using a machine-learning-based meta-analysis of established algorithms. Simultaneously, the inverse dataset of negative interactions not likely to occur was extracted to increase classifier performance, as measured using an expansive set of experimentally validated interactions from a variety of sources. In a second differential mode, candidate miRNAs are predicted by indicating genes to be targeted and others to be avoided to potentially increase specificity of results. As an example, we investigate the neural crest, a transient structure in vertebrate development where miRNAs play a pivotal role. Patterns of metaMIR-predicted miRNA regulation alone partially recapitulated functional relationships among genes, and separate differential analysis revealed miRNA candidates that would downregulate components implicated in cancer progression while not targeting tumour suppressors. Such an approach could aid in therapeutic application of miRNAs to reduce unintended effects. The utility is available at http://rna.informatik.uni-freiburg.de/metaMIR/.

Available:
med (4 KB)   doi:10.1093/nar/gkx605   pmid:28911111   BibTeX Entry ( Davis_Saunders_Mann-Combi_ensem_miRNA-NAR2017 )

MetSV, a novel archaeal lytic virus targeting Methanosarcina strains

Katrin Weidenbach, Lisa Nickel, Horst Neve, Omer S. Alkhnbashi, Sven Kunzel, Anne Kupczok, Thorsten Bauersachs, Liam Cassidy, Andreas Tholey, Rolf Backofen, Ruth A. Schmitz

In: J Virol, 2017

A novel archaeal lytic virus targeting species of the genus Methanosarcina was isolated using Methanosarcina mazei strain Go1 as host. Due to its spherical morphology the virus was designated Methanosarcinaspherical virus (MetSV). Molecular analysis demonstrated that MetSV contains double stranded linear DNA with a genome size of 10,567 bp containing 22 open reading frames (ORFs) all oriented in the same direction. Functions were predicted for some of these ORFs, i. e. like DNA polymerase, ATPase, DNA-binding protein, as well as envelope (structural) protein. MetSV-derived spacers in CRISPR loci were detected in several published Methanosarcina draft genomes using bioinformatic tools, revealing the potential PAM motif (TTA/T). Transcription and expression of several predicted viral ORFs were validated by RT-PCR, PAGE analysis and LC-MS based proteomics. Analysis of core lipids by APCI mass spectrometry showed that MetSV and M. mazei both contain archaeol and glycerol dialkyl glycerol tetraether without cyclopentane moiety (GDGT-0). The MetSV host range is limited to Methanosarcina strains growing as single cells (M. mazei, M. bakeri and M. soligelidi). In contrast, strains growing as sarcina-like aggregates were apparently protected from infection. Heterogeneity related to morphology phases in M. mazei cultures allowed acquisition of resistance to MetSV after challenge by growing as sarcina-like aggregates. CRISPR/Cas mediated resistance was excluded since neither of the two CRISPR arrays showed MetSV-derived spacer acquisition. Based on these findings, we propose that changing the morphology from single cells to sarcina-like aggregates upon rearrangement of the envelope structure prevents infection and subsequent lysis by MetSV.IMPORTANCE Methanoarchaea are among the most abundant organisms on the planet since they are present in high numbers in major anaerobic environments. They convert various carbon sources e.g. acetate, methylamines or methanol to methane and carbon dioxide, thus they have a significant impact on the emission of major greenhouse gases. Today very little is known about viruses specifically infecting methanoarchaea, which most probably impact the abundance of methanoarchaea in microbial consortia. Here we characterize the first identified Methanosarcina infecting virus (MetSV) and show a mechanism for acquiring resistance against MetSV. Based on our results we propose that growth as sarcina-like aggregates prevents infection and subsequent lysis. These findings allow new insights into virus-host relationship in methanogenic community structures, their dynamics and their phase heterogeneity. Moreover, the availability of a specific virus provides new possibilities to deepen our knowledge on defence mechanisms of potential hosts and offers tools for genetic manipulation.

Available:
pdf (2792 KB)   doi:10.1128/JVI.00955-17   pmid:28878086   BibTeX Entry ( Weidenbach_Nickel_Neve-MetSV_novel_archa-2017 )

RIP-Seq Suggests Translational Regulation by L7Ae in Archaea

Michael Daume, Michael Uhl, Rolf Backofen, Lennart Randau

In: MBio, 2017, 8(4)

L7Ae is a universal archaeal protein that recognizes and stabilizes kink-turn (k-turn) motifs in RNA substrates. These structural motifs are widespread in nature and are found in many functional RNA species, including ribosomal RNAs. Synthetic biology approaches utilize L7Ae/k-turn interactions to control gene expression in eukaryotes. Here, we present results of comprehensive RNA immunoprecipitation sequencing (RIP-Seq) analysis of genomically tagged L7Ae from the hyperthermophilic archaeon Sulfolobus acidocaldarius A large set of interacting noncoding RNAs was identified. In addition, several mRNAs, including the l7ae transcript, were found to contain k-turn motifs that facilitate L7Ae binding. In vivo studies showed that L7Ae autoregulates the translation of its mRNA by binding to a k-turn motif present in the 5' untranslated region (UTR). A green fluorescent protein (GFP) reporter system was established in Escherichia coli and verified conservation of L7Ae-mediated feedback regulation in Archaea Mobility shift assays confirmed binding to a k-turn in the transcript of nop5-fibrillarin, suggesting that the expression of all C/D box sRNP core proteins is regulated by L7Ae. These studies revealed that L7Ae-mediated gene regulation evolved in archaeal organisms, generating new tools for the modulation of synthetic gene circuits in bacteria.IMPORTANCE L7Ae is an essential archaeal protein that is known to structure ribosomal RNAs and small RNAs (sRNAs) by binding to their kink-turn motifs. Here, we utilized RIP-Seq methodology to achieve a first global analysis of RNA substrates for L7Ae. Several novel interactions with noncoding RNA molecules (e.g., with the universal signal recognition particle RNA) were discovered. In addition, L7Ae was found to bind to mRNAs, including its own transcript's 5' untranslated region. This feedback-loop control is conserved in most archaea and was incorporated into a reporter system that was utilized to control gene expression in bacteria. These results demonstrate that L7Ae-mediated gene regulation evolved originally in archaeal organisms. The feedback-controlled reporter gene system can easily be adapted for synthetic biology approaches that require strict gene expression control.

Available:
pdf (2489 KB)   doi:10.1128/mBio.00730-17   pmid:28765217   BibTeX Entry ( Daume_Uhl_Backofen-RIP_Sugge_Trans-2017 )

Mechanism of beta-actin mRNA Recognition by ZBP1

Giuseppe Nicastro, Adela M. Candel, Michael Uhl, Alain Oregioni, David Hollingworth, Rolf Backofen, Stephen R. Martin, Andres Ramos

In: Cell Rep, 2017, 18(5), 1187-1199

Zipcode binding protein 1 (ZBP1) is an oncofetal RNA-binding protein that mediates the transport and local translation of beta-actin mRNA by the KH3-KH4 di-domain, which is essential for neuronal development. The high-resolution structures of KH3-KH4 with their respective target sequences show that KH4 recognizes a non-canonical GGA sequence via an enlarged and dynamic hydrophobic groove, whereas KH3 binding to a core CA sequence occurs with low specificity. A data-informed kinetic simulation of the two-step binding reaction reveals that the overall reaction is driven by the second binding event and that the moderate affinities of the individual interactions favor RNA looping. Furthermore, the concentration of ZBP1, but not of the target RNA, modulates the interaction, which explains the functional significance of enhanced ZBP1 expression during embryonic development.

Available:
pdf (3158 KB)   doi:10.1016/j.celrep.2016.12.091   pmid:28147274   BibTeX Entry ( Nicastro_Candel_Uhl-Mecha_beta_mRNA-2017 )

An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions

Kousik Kundu, Rolf Backofen

In: Methods Mol Biol, 2017, 1555, 83-97

Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2-peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2-peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2-peptide interactions.

Available:
pdf (240 KB)   doi:10.1007/978-1-4939-6762-9_6   pmid:28092029   BibTeX Entry ( Kundu_Backofen-Effic_Semi_Learn-2017 )

CRISPR and Salty: CRISPR-Cas Systems in Haloarchaea

Lisa-Katharina Maier, Omer S. Alkhnbashi, Rolf Backofen, Anita Marchfelder

In: Béatrice Clouet-d'Orval, RNA Metabolism and Gene Expression in Archaea, 2017, 243--269

CRISPR-Cas (CRISPR: Clustered Regularly Interspaced Short Palindromic Repeats and Cas: CRISPR associated) systems are unique defence mechanisms since they are able to adapt to new invaders and are heritable. CRISPR-Cas systems facilitate the sequence-specific elimination of invading genetic elements in prokaryotes, they are found in 45\% of bacteria and 85\% of archaea. Their general features have been studied in detail, but subtype- and species-specific variations await investigation. Haloarchaea is one of few archaeal classes in which CRISPR-Cas systems have been investigated in more than one genus. Here, we summarize the available information on CRISPR-Cas defence in three Haloarchaea: Haloferax volcanii, Haloferax mediterranei and Haloarcula hispanica. Haloarchaea share type I CRISPR-Cas systems, with subtype I-B being dominant. Type I-B systems rely on Cas proteins Cas5, Cas7, and Cas8b for the interference reaction and these proteins have been shown to form a Cascade (CRISPR-associated complex for antiviral defence) -like complex in Hfx (Haloferax). volcanii. Cas6b is the endonuclease for crRNA (CRISPR RNA) maturation in type I-B systems but the protein is dispensable for interference in Hfx. volcanii. Haloarchaea share a common repeat sequence and crRNA-processing pattern. A prerequisite for successful invader recognition in Hfx. volcanii is base pairing over a ten-nucleotide-long non-contiguous seed sequence. Moreover, Hfx. volcanii and Har (Haloarcula). hispanica rely each on certain specific PAM (protospacer adjacent motif) sequences to elicit interference, but they share only one PAM sequence. Primed adaptation in Har. hispanica relies on another set of PAM sequences.

Available:
pdf (1134 KB)   doi:10.1007/978-3-319-65795-0_11   BibTeX Entry ( Maier-Alkhnbash-book-chapter )

sRNA154 a newly identified regulator of nitrogen fixation in Methanosarcina mazei strain Go1

Daniela Prasse, Konrad U. F\"orstner, Dominik J\"ager, Rolf Backofen, Ruth A. Schmitz

In: RNA Biol, 2017, 13(11), 1544-58

Trans-encoded sRNA154 is exclusively expressed under nitrogen (N)-deficiency in Methanosarcina mazei strain Go1. The sRNA154 deletion strain showed a significant decrease in growth under N-limitation, pointing toward a regulatory role of sRNA154 in N-metabolism. Aiming to elucidate its regulatory function we characterized sRNA154 by means of biochemical and genetic approaches. 24 homologs of sRNA154 were identified in recently reported draft genomes of Methanosarcina strains, demonstrating high conservation in sequence and predicted secondary structure with two highly conserved single stranded loops. Transcriptome studies of sRNA154 deletion mutants by an RNA-seq approach uncovered nifH- and nrpA-mRNA, encoding the alpha-subunit of nitrogenase and the transcriptional activator of the nitrogen fixation (nif)-operon, as potential targets besides other components of the N-metabolism. Furthermore, results obtained from stability, complementation and western blot analysis, as well as in silico target predictions combined with electrophoretic mobility shift-assays, argue for a stabilizing effect of sRNA154 on the polycistronic nif-mRNA and nrpA-mRNA by binding with both loops. Further identified N-related targets were studied, which demonstrates that translation initiation of glnA2-mRNA, encoding glutamine synthetase2, appears to be affected by sRNA154 masking the ribosome binding site, whereas glnA1-mRNA appears to be stabilized by sRNA154. Overall, we propose that sRNA154 has a crucial regulatory role in N-metabolism in M. mazei by stabilizing the polycistronic mRNA encoding nitrogenase and glnA1-mRNA, as well as allowing a feed forward regulation of nif-gene expression by stabilizing nrpA-mRNA. Consequently, sRNA154 represents the first archaeal sRNA, for which a positive posttranscriptional regulation is demonstrated as well as inhibition of translation initiation.

Available:
doi:10.1080/15476286.2017.1306170   pmid:28296572   BibTeX Entry ( Prasse_Forstner_Jager-sRNA_newly_ident-2017 )

Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers

Bjorn A. Gruning, Eric Rasche, Boris Rebolledo-Jaramillo, Carl Eberhard, Torsten Houwaart, John Chilton, Nate Coraor, Rolf Backofen, James Taylor, Anton Nekrutenko

In: PLoS Comput Biol, 2017, 13(5), e1005425

What does it take to convert a heap of sequencing data into a publishable result? First, common tools are employed to reduce primary data (sequencing reads) to a form suitable for further analyses (i.e., the list of variable sites). The subsequent exploratory stage is much more ad hoc and requires the development of custom scripts and pipelines, making it problematic for biomedical researchers. Here, we describe a hybrid platform combining common analysis pathways with the ability to explore data interactively. It aims to fully encompass and simplify the "raw data-to-publication" pathway and make it reproducible.

Available:
doi:10.1371/journal.pcbi.1005425   pmid:28542180   BibTeX Entry ( Gruning_Rasche_Rebolledo-Jaramillo-Jupyt_and_Galax-2017 )

RNA-bioinformatics: Tools, services and databases for the analysis of RNA-based regulation

Rolf Backofen, Jan Engelhardt, Anika Erxleben, Jorg Fallmann, Bjorn Gruning, Uwe Ohler, Nikolaus Rajewsky, Peter F. Stadler

In: J Biotechnol, 2017, 261, 76-84

The importance of RNA-based regulation is becoming more and more evident. Genome-wide sequencing efforts have shown that the majority of the DNA in eukaryotic genomes is transcribed. Advanced high-throughput techniques like CLIP for the genome-wide detection of RNA-protein interactions have shown that post-transcriptional regulation by RNA-binding proteins matches the complexity of transcriptional regulation. The need for a specialized and integrated analysis of RNA-based data has led to the foundation of the RNA Bioinformatics Center (RBC) within the German Network of Bioinformatics Infrastructure (de.NBI). This paper describes the tools, services and databases provided by the RBC, and shows example applications. Furthermore, we have setup an RNA workbench within the Galaxy framework. For an easy dissemination, we offer a virtualized version of Galaxy (via Galaxy Docker) enabling other groups to use our RNA workbench in a very simple way.

Available:
pdf (539 KB)   doi:10.1016/j.jbiotec.2017.05.019   pmid:28554830   BibTeX Entry ( Backofen_Engelhardt_Erxleben-RNA_Tools_servi-2017 )

The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy

Bjorn A. Gruning, Jorg Fallmann, Dilmurat Yusuf, Sebastian Will, Anika Erxleben, Florian Eggenhofer, Torsten Houwaart, Berenice Batut, Pavankumar Videm, Andrea Bagnacani, Markus Wolfien, Steffen C. Lott, Youri Hoogstrate, Wolfgang R. Hess, Olaf Wolkenhauer, Steve Hoffmann, Altuna Akalin, Uwe Ohler, Peter F. Stadler, Rolf Backofen

In: Nucleic Acids Research, 2017, 45(W1), W560-W566

RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis. AVAILABILITY: The RNA workbench is available at https://github.com/bgruening/galaxy-rna-workbench.

Available:
pdf (597 KB)   doi:10.1093/nar/gkx409   pmid:28582575   BibTeX Entry ( Gruning_Fallmann_Yusuf-The_RNA_workb-NAR2017 )

Recent advances in RNA folding

Jorg Fallmann, Sebastian Will, Jan Engelhardt, Bjorn Gruning, Rolf Backofen, Peter F. Stadler

In: J Biotechnol, 2017, 261, 97-104

In the realm of nucleic acid structures, secondary structure forms a conceptually important intermediate level of description and explains the dominating part of the free energy of structure formation. Secondary structures are well conserved over evolutionary time-scales and for many classes of RNAs evolve slower than the underlying primary sequences. Given the close link between structure and function, secondary structure is routinely used as a basis to explain experimental findings. Recent technological advances, finally, have made it possible to assay secondary structure directly using high throughput methods. From a computational biology point of view, secondary structures have a special role because they can be computed efficiently using exact dynamic programming algorithms. In this contribution we provide a short overview of RNA folding algorithms, recent additions and variations and address methods to align, compare, and cluster RNA structures, followed by a tabular summary of the most important software suites in the fields.

Available:
pdf (340 KB)   doi:10.1016/j.jbiotec.2017.07.007   pmid:28690134   BibTeX Entry ( Fallmann_Will_Engelhardt-Recen_advan_RNA-2017 )

IntaRNA 2.0: enhanced and customizable prediction of RNA-RNA interactions

Martin Mann, Patrick R. Wright, Rolf Backofen

In: Nucleic Acids Research, 2017, 45(W1), W435-W439

The IntaRNA algorithm enables fast and accurate prediction of RNA-RNA hybrids by incorporating seed constraints and interaction site accessibility. Here, we introduce IntaRNAv2, which enables enhanced parameterization as well as fully customizable control over the prediction modes and output formats. Based on up to date benchmark data, the enhanced predictive quality is shown and further improvements due to more restrictive seed constraints are highlighted. The extended web interface provides visualizations of the new minimal energy profiles for RNA-RNA interactions. These allow a detailed investigation of interaction alternatives and can reveal potential interaction site multiplicity. IntaRNAv2 is freely available (source and binary), and distributed via the conda package manager. Furthermore, it has been included into the Galaxy workflow framework and its already established web interface enables ad hoc usage.

Available:
pdf (285 KB)   supplement.pdf (724 KB)   doi:10.1093/nar/gkx279   pmid:28472523   BibTeX Entry ( Mann_Wright_Backofen-IntaR_enhan_and-NAR2017 )

DHX9 suppresses RNA processing defects originating from the Alu invasion of the human genome

Tugce Aktas, Ibrahim Avsar Ilik, Daniel Maticzka, Vivek Bhardwaj, Cecilia Pessoa Rodrigues, Gerhard Mittler, Thomas Manke, Rolf Backofen, Asifa Akhtar

In: Nature, 2017, 544(7648), 115-119

Transposable elements are viewed as 'selfish genetic elements', yet they contribute to gene regulation and genome evolution in diverse ways. More than half of the human genome consists of transposable elements. Alu elements belong to the short interspersed nuclear element (SINE) family of repetitive elements, and with over 1 million insertions they make up more than 10% of the human genome. Despite their abundance and the potential evolutionary advantages they confer, Alu elements can be mutagenic to the host as they can act as splice acceptors, inhibit translation of mRNAs and cause genomic instability. Alu elements are the main targets of the RNA-editing enzyme ADAR and the formation of Alu exons is suppressed by the nuclear ribonucleoprotein HNRNPC, but the broad effect of massive secondary structures formed by inverted-repeat Alu elements on RNA processing in the nucleus remains unknown. Here we show that DHX9, an abundant nuclear RNA helicase, binds specifically to inverted-repeat Alu elements that are transcribed as parts of genes. Loss of DHX9 leads to an increase in the number of circular-RNA-producing genes and amount of circular RNAs, translational repression of reporters containing inverted-repeat Alu elements, and transcriptional rewiring (the creation of mostly nonsensical novel connections between exons) of susceptible loci. Biochemical purifications of DHX9 identify the interferon-inducible isoform of ADAR (p150), but not the constitutively expressed ADAR isoform (p110), as an RNA-independent interaction partner. Co-depletion of ADAR and DHX9 augments the double-stranded RNA accumulation defects, leading to increased circular RNA production, revealing a functional link between these two enzymes. Our work uncovers an evolutionarily conserved function of DHX9. We propose that it acts as a nuclear RNA resolvase that neutralizes the immediate threat posed by transposon insertions and allows these elements to evolve as tools for the post-transcriptional regulation of gene expression.

Available:
doi:10.1038/nature21715   pmid:28355180   BibTeX Entry ( Aktas_Avsar_Ilik_Maticzka-DHX_suppr_RNA-2017 )

RNAscClust: clustering RNA sequences using structure conservation and graph based motifs

Milad Miladi, Alexander Junge, Fabrizio Costa, Stefan E Seemann, Jakob Hull Havgaard, Jan Gorodkin, Rolf Backofen

In: Bioinformatics, 02 2017, 33(14), 2089-2096

Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account.Here, we present RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments.RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust. Supplementary data are available at Bioinformatics online.

Available:
pdf (410 KB)   med (4 KB)   doi:10.1093/bioinformatics/btx114   BibTeX Entry ( Miladi-Junge-rnascclust-2017 )

Computational analysis of CLIP-seq data

Michael Uhl, Torsten Houwaart, Gianluca Corrado, Patrick R. Wright, Rolf Backofen

In: Methods, 2017, 118-119, 60-72

CLIP-seq experiments are currently the most important means for determining the binding sites of RNA binding proteins on a genome-wide level. The computational analysis can be divided into three steps. In the first pre-processing stage, raw reads have to be trimmed and mapped to the genome. This step has to be specifically adapted for each CLIP-seq protocol. The next step is peak calling, which is required to remove unspecific signals and to determine bona fide protein binding sites on target RNAs. Here, both protocol-specific approaches as well as generic peak callers are available. Despite some peak callers being more widely used, each peak caller has its specific assets and drawbacks, and it might be advantageous to compare the results of several methods. Although peak calling is often the final step in many CLIP-seq publications, an important follow-up task is the determination of binding models from CLIP-seq data. This is central because CLIP-seq experiments are highly dependent on the transcriptional state of the cell in which the experiment was performed. Thus, relying solely on binding sites determined by CLIP-seq from different cells or conditions can lead to a high false negative rate. This shortcoming can, however, be circumvented by applying models that predict additional putative binding sites.

Available:
med (3 KB)   pdf (1183 KB)   doi:10.1016/j.ymeth.2017.02.006   pmid:28254606   BibTeX Entry ( Uhl_Houwaart_Corrado-Compu_Analy_CLIP-2017 )

Differentiation of ncRNAs from small mRNAs in Escherichia coli O157:H7 EDL933 (EHEC) by combined RNAseq and RIBOseq - ryhB encodes the regulatory RNA RyhB and a peptide, RyhP

Klaus Neuhaus, Richard Landstorfer, Svenja Simon, Steffen Schober, Patrick R. Wright, Cameron Smith, Rolf Backofen, Romy Wecko, Daniel A. Keim, Siegfried Scherer

In: BMC Genomics, 2017, 18(1), 216

BACKGROUND: While NGS allows rapid global detection of transcripts, it remains difficult to distinguish ncRNAs from short mRNAs. To detect potentially translated RNAs, we developed an improved protocol for bacterial ribosomal footprinting (RIBOseq). This allowed distinguishing ncRNA from mRNA in EHEC. A high ratio of ribosomal footprints per transcript (ribosomal coverage value, RCV) is expected to indicate a translated RNA, while a low RCV should point to a non-translated RNA. RESULTS: Based on their low RCV, 150 novel non-translated EHEC transcripts were identified as putative ncRNAs, representing both antisense and intergenic transcripts, 74 of which had expressed homologs in E. coli MG1655. Bioinformatics analysis predicted statistically significant target regulons for 15 of the intergenic transcripts; experimental analysis revealed 4-fold or higher differential expression of 46 novel ncRNA in different growth media. Out of 329 annotated EHEC ncRNAs, 52 showed an RCV similar to protein-coding genes, of those, 16 had RIBOseq patterns matching annotated genes in other enterobacteriaceae, and 11 seem to possess a Shine-Dalgarno sequence, suggesting that such ncRNAs may encode small proteins instead of being solely non-coding. To support that the RIBOseq signals are reflecting translation, we tested the ribosomal-footprint covered ORF of ryhB and found a phenotype for the encoded peptide in iron-limiting condition. CONCLUSION: Determination of the RCV is a useful approach for a rapid first-step differentiation between bacterial ncRNAs and small mRNAs. Further, many known ncRNAs may encode proteins as well.

Available:
pdf (1442 KB)   doi:10.1186/s12864-017-3586-9   pmid:28245801   BibTeX Entry ( Neuhaus_Landstorfer_Simon-Diffe_ncRNA_from-2017 )

Structural constraints and enzymatic promiscuity in the Cas6-dependent generation of crRNAs

Viktoria Reimann, Omer S. Alkhnbashi, Sita J. Saunders, Ingeborg Scholz, Stephanie Hein, Rolf Backofen, Wolfgang R. Hess

In: Nucleic Acids Res, 2017, 45(2), 915-925

A hallmark of defense mechanisms based on clustered regularly interspaced short palindromic repeats (CRISPR) and associated sequences (Cas) are the crRNAs that guide these complexes in the destruction of invading DNA or RNA. Three separate CRISPR-Cas systems exist in the cyanobacterium Synechocystis sp. PCC 6803. Based on genetic and transcriptomic evidence, two associated endoribonucleases, Cas6-1 and Cas6-2a, were postulated to be involved in crRNA maturation from CRISPR1 or CRISPR2, respectively. Here, we report a promiscuity of both enzymes to process in vitro not only their cognate transcripts, but also the respective non-cognate precursors, whereas they are specific in vivo Moreover, while most of the repeats serving as substrates were cleaved in vitro, some were not. RNA structure predictions suggested that the context sequence surrounding a repeat can interfere with its stable folding. Indeed, structure accuracy calculations of the hairpin motifs within the repeat sequences explained the majority of analyzed cleavage reactions, making this a good measure for predicting successful cleavage events. We conclude that the cleavage of CRISPR1 and CRISPR2 repeat instances requires a stable formation of the characteristic hairpin motif, which is similar between the two types of repeats. The influence of surrounding sequences might partially explain variations in crRNA abundances and should be considered when designing artificial CRISPR arrays.

Available:
pdf (2849 KB)   doi:10.1093/nar/gkw786   pmid:27599840   BibTeX Entry ( Reimann_Alkhnbashi_Saunders-Struc_const_and-NAR2017 )

GenToS: Use of Orthologous Gene Information to Prioritize Signals from Human GWAS

Anselm S. Hoppmann, Pascal Schlosser, Rolf Backofen, Ekkehart Lausch, Anna Kottgen

In: PLoS One, 2016, 11(9), e0162466

Genome-wide association studies (GWAS) evaluate associations between genetic variants and a trait or disease of interest free of prior biological hypotheses. GWAS require stringent correction for multiple testing, with genome-wide significance typically defined as association p-value <5*10-8. This study presents a new tool that uses external information about genes to prioritize SNP associations (GenToS). For a given list of candidate genes, GenToS calculates an appropriate statistical significance threshold and then searches for trait-associated variants in summary statistics from human GWAS. It thereby allows for identifying trait-associated genetic variants that do not meet genome-wide significance. The program additionally tests for enrichment of significant candidate gene associations in the human GWAS data compared to the number expected by chance. As proof of principle, this report used external information from a comprehensive resource of genetically manipulated and systematically phenotyped mice. Based on selected murine phenotypes for which human GWAS data for corresponding traits were publicly available, several candidate gene input lists were derived. Using GenToS for the investigation of candidate genes underlying murine skeletal phenotypes in data from a large human discovery GWAS meta-analysis of bone mineral density resulted in the identification of significantly associated variants in 29 genes. Index variants in 28 of these loci were subsequently replicated in an independent GWAS replication step, highlighting that they are true positive associations. One signal, COL11A1, has not been discovered through GWAS so far and represents a novel human candidate gene for altered bone mineral density. The number of observed genes that contained significant SNP associations in human GWAS based on murine candidate gene input lists was much greater than the number expected by chance across several complex human traits (enrichment p-value as low as 10-10). GenToS can be used with any candidate gene list, any GWAS summary file, runs on a desktop computer and is freely available.

Available:
pdf (1759 KB)   doi:10.1371/journal.pone.0162466   pmid:27612175   BibTeX Entry ( Hoppmann_Schlosser_Backofen-GenTo_Use_Ortho-2016 )

SnoReport 2.0: new features and a refined Support Vector Machine to improve snoRNA identification

Joao Victor de Araujo Oliveira, Fabrizio Costa, Rolf Backofen, Peter Florian Stadler, Maria Emilia Machado Telles Walter, Jana Hertel

In: BMC Bioinformatics, 2016, 17(Suppl 18), 464

BACKGROUND: snoReport uses RNA secondary structure prediction combined with machine learning as the basis to identify the two main classes of small nucleolar RNAs, the box H/ACA snoRNAs and the box C/D snoRNAs. Here, we present snoReport 2.0, which substantially improves and extends in the original method by: extracting new features for both box C/D and H/ACA box snoRNAs; developing a more sophisticated technique in the SVM training phase with recent data from vertebrate organisms and a careful choice of the SVM parameters C and gamma; and using updated versions of tools and databases used for the construction of the original version of snoReport. To validate the new version and to demonstrate its improved performance, we tested snoReport 2.0 in different organisms. RESULTS: Results of the training and test phases of boxes H/ACA and C/D snoRNAs, in both versions of snoReport, are discussed. Validation on real data was performed to evaluate the predictions of snoReport 2.0. Our program was applied to a set of previously annotated sequences, some of them experimentally confirmed, of humans, nematodes, drosophilids, platypus, chickens and leishmania. We significantly improved the predictions for vertebrates, since the training phase used information of these organisms, but H/ACA box snoRNAs identification was improved for the other ones. CONCLUSION: We presented snoReport 2.0, to predict H/ACA box and C/D box snoRNAs, an efficient method to find true positives and avoid false positives in vertebrate organisms. H/ACA box snoRNA classifier showed an F-score of 93 % (an improvement of 10 % regarding the previous version), while C/D box snoRNA classifier, an F-Score of 94 % (improvement of 14 %). Besides, both classifiers exhibited performance measures above 90 %. These results show that snoReport 2.0 avoid false positives and false negatives, allowing to predict snoRNAs with high quality. In the validation phase, snoReport 2.0 predicted 67.43 % of vertebrate organisms for both classes. For Nematodes and Drosophilids, 69 % and 76.67 %, for H/ACA box snoRNAs were predicted, respectively, showing that snoReport 2.0 is good to identify snoRNAs in vertebrates and also H/ACA box snoRNAs in invertebrates organisms.

Available:
pdf (2678 KB)   doi:10.1186/s12859-016-1345-6   pmid:28105919   BibTeX Entry ( de_Araujo_Oliveira_Costa_Backofen-SnoRe_new_featu-2016 )

The lncRNA landscape of breast cancer reveals a role for DSCAM-AS1 in breast cancer progression

Yashar S. Niknafs, Sumin Han, Teng Ma, Corey Speers, Chao Zhang, Kari Wilder-Romans, Matthew K. Iyer, Sethuramasundaram Pitchiaya, Rohit Malik, Yasuyuki Hosono, John R. Prensner, Anton Poliakov, Udit Singhal, Lanbo Xiao, Steven Kregel, Ronald F. Siebenaler, Shuang G. Zhao, Michael Uhl, Alexander Gawronski, Daniel F. Hayes, Lori J. Pierce, Xuhong Cao, Colin Collins, Rolf Backofen, Cenk S. Sahinalp, James M. Rae, Arul M. Chinnaiyan, Felix Y. Feng

In: Nat Commun, 2016, 7, 12791

Molecular classification of cancers into subtypes has resulted in an advance in our understanding of tumour biology and treatment response across multiple tumour types. However, to date, cancer profiling has largely focused on protein-coding genes, which comprise <1% of the genome. Here we leverage a compendium of 58,648 long noncoding RNAs (lncRNAs) to subtype 947 breast cancer samples. We show that lncRNA-based profiling categorizes breast tumours by their known molecular subtypes in breast cancer. We identify a cohort of breast cancer-associated and oestrogen-regulated lncRNAs, and investigate the role of the top prioritized oestrogen receptor (ER)-regulated lncRNA, DSCAM-AS1. We demonstrate that DSCAM-AS1 mediates tumour progression and tamoxifen resistance and identify hnRNPL as an interacting protein involved in the mechanism of DSCAM-AS1 action. By highlighting the role of DSCAM-AS1 in breast cancer biology and treatment resistance, this study provides insight into the potential clinical implications of lncRNAs in breast cancer.

Available:
pdf (1451 KB)   doi:10.1038/ncomms12791   pmid:27666543   BibTeX Entry ( Niknafs_Han_Ma-The_lncRN_lands-2016 )

Differential transcriptional responses to Ebola and Marburg virus infection in bat and human cells

Martin Holzer, Verena Krahling, Fabian Amman, Emanuel Barth, Stephan H. Bernhart, Victor A. O. Carmelo, Maximilian Collatz, Gero Doose, Florian Eggenhofer, Jan Ewald, Jorg Fallmann, Lasse M. Feldhahn, Markus Fricke, Juliane Gebauer, Andreas J. Gruber, Franziska Hufsky, Henrike Indrischek, Sabina Kanton, Jorg Linde, Nelly Mostajo, Roman Ochsenreiter, Konstantin Riege, Lorena Rivarola-Duarte, Abdullah H. Sahyoun, Sita J. Saunders, Stefan E. Seemann, Andrea Tanzer, Bertram Vogel, Stefanie Wehner, Michael T. Wolfinger, Rolf Backofen, Jan Gorodkin, Ivo Grosse, Ivo Hofacker, Steve Hoffmann, Christoph Kaleta, Peter F. Stadler, Stephan Becker, Manja Marz

In: Sci Rep, 2016, 6, 34589

The unprecedented outbreak of Ebola in West Africa resulted in over 28,000 cases and 11,000 deaths, underlining the need for a better understanding of the biology of this highly pathogenic virus to develop specific counter strategies. Two filoviruses, the Ebola and Marburg viruses, result in a severe and often fatal infection in humans. However, bats are natural hosts and survive filovirus infections without obvious symptoms. The molecular basis of this striking difference in the response to filovirus infections is not well understood. We report a systematic overview of differentially expressed genes, activity motifs and pathways in human and bat cells infected with the Ebola and Marburg viruses, and we demonstrate that the replication of filoviruses is more rapid in human cells than in bat cells. We also found that the most strongly regulated genes upon filovirus infection are chemokine ligands and transcription factors. We observed a strong induction of the JAK/STAT pathway, of several genes encoding inhibitors of MAP kinases (DUSP genes) and of PPP1R15A, which is involved in ER stress-induced cell death. We used comparative transcriptomics to provide a data resource that can be used to identify cellular responses that might allow bats to survive filovirus infections.

Available:
pdf (5669 KB)   doi:10.1038/srep34589   pmid:27713552   BibTeX Entry ( Holzer_Krahling_Amman-Diffe_trans_respo-2016 )

RNA-binding protein HuR and the members of the miR-200 family play an unconventional role in the regulation of c-Jun mRNA

Giorgia Del Vecchio, Francesca De Vito, Sita J. Saunders, Adele Risi, Cecilia Mannironi, Irene Bozzoni, Carlo Presutti

In: RNA, 2016, 22(10), 1510-21

Post-transcriptional gene regulation is a fundamental step for coordinating cellular response in a variety of processes. RNA-binding proteins (RBPs) and microRNAs (miRNAs) are the most important factors responsible for this regulation. Here we report that different components of the miR-200 family are involved in c-Jun mRNA regulation with the opposite effect. While miR-200b inhibits c-Jun protein production, miR-200a tends to increase the JUN amount through a stabilization of its mRNA. This action is dependent on the presence of the RBP HuR that binds the 3'UTR of c-Jun mRNA in a region including the mir-200a binding site. The position of the binding site is fundamental; by mutating this site, we demonstrate that the effect is not micro-RNA specific. These results indicate that miR-200a triggers a microRNA-mediated stabilization of c-Jun mRNA, promoting the binding of HuR with c-Jun mRNA. This is the first example of a positive regulation exerted by a microRNA on an important oncogene in proliferating cells.

Available:
pdf (1206 KB)   doi:10.1261/rna.057588.116   pmid:27473170   BibTeX Entry ( Del_Vecchio_De_Vito_Saunders-RNA_prote_HuR-2016 )

Plasticity of archaeal C/D box sRNA biogenesis

Vanessa Tripp, Roman Martin, Alvaro Orell, Omer S. Alkhnbashi, Rolf Backofen, Lennart Randau

In: Mol Microbiol, 2016, 103(1), 151-164

Archaeal and eukaryotic organisms contain sets of C/D box s(no)RNAs with guide sequences that determine ribose 2'-O-methylation sites of target RNAs. The composition of these C/D box sRNA sets is highly variable between organisms and results in varying RNA modification patterns which are important for ribosomal RNA folding and stability. Little is known about the genomic organization of C/D box sRNA genes in archaea. Here, we aimed to obtain first insights into the biogenesis of these archaeal C/D box sRNAs and analyzed the genetic context of more than 300 archaeal sRNA genes. We found that the majority of these genes do not possess independent promoters but are rather located at positions that allow for co-transcription with neighboring genes and their start or stop codons were frequently incorporated into the conserved boxC and D motifs. The biogenesis of plasmid-encoded C/D box sRNA variants was analyzed in vivo in Sulfolobus acidocaldarius. It was found that C/D box sRNA maturation occurs independent of their genetic context and relies solely on the presence of intact RNA kink-turn structures. The observed plasticity of C/D box sRNA biogenesis is suggested to enable their accelerated evolution and, consequently, allow for adjustments of the RNA modification landscape. This article is protected by copyright. All rights reserved.

Available:
pdf (1947 KB)   doi:10.1111/mmi.13549   pmid:27743417   BibTeX Entry ( Tripp_Martin_Orell-Plast_archa_box-2016 )

Characterizing leader sequences of CRISPR loci

Omer S. Alkhnbashi, Shiraz A. Shah, Roger A. Garrett, Sita J. Saunders, Fabrizio Costa, Rolf Backofen

In: Bioinformatics, 2016, 32(17), i576-i585

MOTIVATION: The CRISPR-Cas system is an adaptive immune system in many archaea and bacteria, which provides resistance against invading genetic elements. The first phase of CRISPR-Cas immunity is called adaptation, in which small DNA fragments are excised from genetic elements and are inserted into a CRISPR array generally adjacent to its so called leader sequence at one end of the array. It has been shown that transcription initiation and adaptation signals of the CRISPR array are located within the leader. However, apart from promoters, there is very little knowledge of sequence or structural motifs or their possible functions. Leader properties have mainly been characterized through transcriptional initiation data from single organisms but large-scale characterization of leaders has remained challenging due to their low level of sequence conservation. RESULTS: We developed a method to successfully detect leader sequences by focusing on the consensus repeat of the adjacent CRISPR array and weak upstream conservation signals. We applied our tool to the analysis of a comprehensive genomic database and identified several characteristic properties of leader sequences specific to archaea and bacteria, ranging from distinctive sizes to preferential indel localization. CRISPRleader provides a full annotation of the CRISPR array, its strand orientation as well as conserved core leader boundaries that can be uploaded to any genome browser. In addition, it outputs reader-friendly HTML pages for conserved leader clusters from our database. AVAILABILITY AND IMPLEMENTATION: CRISPRleader and multiple sequence alignments for all 195 leader clusters are available at http://www.bioinf.uni-freiburg.de/Software/CRISPRleader/ CONTACT: costa@informatik.uni-freiburg.de or backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
pdf (979 KB)   doi:10.1093/bioinformatics/btw454   pmid:27587677   BibTeX Entry ( Alkhnbashi_Shah_Garrett-Chara_leade_seque-2016 )

Photorhabdus-nematode symbiosis is dependent on hfq-mediated regulation of secondary metabolites

Nicholas J. Tobias, Antje K. Heinrich, Helena Eresmann, Patrick R. Wright, Nick Neubacher, Rolf Backofen, Helge B. Bode

In: Environ Microbiol, 2016, 19(1), 119-129

Photorhabdus luminescens maintains a symbiotic relationship with the nematodes Heterorhabditis bacteriophora and together they infect and kill insect larvae. To maintain this symbiotic relationship, the bacteria must produce an array of secondary metabolites to assist in the development and replication of nematodes. The regulatory mechanisms surrounding production of these compounds are mostly unknown. The global post-transcriptional regulator, Hfq, is widespread in bacteria and performs many functions, one of which is the facilitation of sRNA binding to target mRNAs, with recent research thoroughly exploring its various pleiotropic effects. Here we generate and characterize an hfq deletion mutant and show that in the absence of hfq, the bacteria are no longer able to maintain a healthy symbiosis with nematodes due to the abolishment of the production of all known secondary metabolites. RNAseq led us to produce a second deletion of a known repressor, HexA, in the same strain, which restored both metabolite production and symbiosis. This article is protected by copyright. All rights reserved.

Available:
doi:10.1111/1462-2920.13502   pmid:27555343   BibTeX Entry ( Tobias_Heinrich_Eresmann-Photo_symbi_depen-2016 )

MicroRNA Profiling in Aqueous Humor of Individual Human Eyes by Next-Generation Sequencing

Thomas Wecker, Klaus Hoffmeier, Anne Plotner, Bjorn Andreas Gruning, Ralf Horres, Rolf Backofen, Thomas Reinhard, Gunther Schlunck

In: Invest Ophthalmol Vis Sci, 2016, 57(4), 1706-13

PURPOSE: Extracellular microRNAs (miRNAs) in aqueous humor were suggested to have a role in transcellular signaling and may serve as disease biomarkers. The authors adopted next-generation sequencing (NGS) techniques to further characterize the miRNA profile in single samples of 60 to 80 muL human aqueous humor. METHODS: Samples were obtained at the outset of cataract surgery in nine independent, otherwise healthy eyes. Four samples were used to extract RNA and generate sequencing libraries, followed by an adapter-driven amplification step, electrophoretic size selection, sequencing, and data analysis. Five samples were used for quantitative PCR (qPCR) validation of NGS results. Published NGS data on circulating miRNAs in blood were analyzed in comparison. RESULTS: One hundred fifty-eight miRNAs were consistently detected by NGS in all four samples; an additional 59 miRNAs were present in at least three samples. The aqueous humor miRNA profile shows some overlap with published NGS-derived inventories of circulating miRNAs in blood plasma with high prevalence of human miR-451a, -21, and -16. In contrast to blood, miR-184, -4448, -30a, -29a, -29c, -19a, -30d, -205, -24, -22, and -3074 were detected among the 20 most prevalent miRNAs in aqueous humor. Relative expression patterns of miR-451a, -202, and -144 suggested by NGS were confirmed by qPCR. CONCLUSIONS: Our data illustrate the feasibility of miRNA analysis by NGS in small individual aqueous humor samples. Intraocular cells as well as blood plasma contribute to the extracellular aqueous humor miRNome. The data suggest possible roles of miRNA in intraocular cell adhesion and signaling by TGF-beta and Wnt, which are important in intraocular pressure regulation and glaucoma.

Available:
pdf (869 KB)   doi:10.1167/iovs.15-17828   pmid:27064390   BibTeX Entry ( Wecker_Hoffmeier_Plotner-Micro_Profi_Aqueo-2016 )

Spatiotemporal alignment of radial tracheid diameter profiles of submontane Norway spruce

D.F. Stangler, M. Mann, H.-P. Kahle, E. Rosskopf, S. Fink, H. Spiecker

In: Dendrochronologia, 2016, 37, 33-45

Studying intra-annual wood formation dynamics provides valuable information on how tree growth and forests are affected by environmental changes and climatic extreme events. This study has the aim to evaluate and to quantify synergetic potentials emerging from a combination of current state of the art techniques used to monitor intra-annual wood formation processes. Norway spruce trees were studied in detail during the growing season 2009 with weekly sampling of microcores, high resolution point-dendrometers and wood anatomical analysis. The combination of the applied techniques allowed us to convert the spatial scales of radial tracheid diameter profiles to seasonal time scales and to synchronize fluctuations in intra-annual cell diameter profiles. This spatiotemporal information was used to validate the recently introduced software MICA (Multiple interval-based curve alignment). In comparison to the conventional approach of averaging profiles of tree ring variables, the MICA aligned profiles exhibit a significantly higher synchronicity of the averaged data points. We also demonstrate two new features in the MICA application that enable to extrapolate spatiotemporal information between intra-annual profiles for the construction of robust mean (consensus) profiles that are representative for the population dynamics. By using a set of complementary techniques in an integrated approach, this study highlights a new methodological framework that can contribute to a better understanding of the environmental control of wood formation during the growing season.

Available:
doi:10.1016/j.dendro.2015.12.001   BibTeX Entry ( Stangler:16 )

Global RNA recognition patterns of post-transcriptional regulators Hfq and CsrA revealed by UV crosslinking in vivo

Erik Holmqvist, Patrick R. Wright, Lei Li, Thorsten Bischler, Lars Barquist, Richard Reinhardt, Rolf Backofen, Jorg Vogel

In: EMBO J, 2016, 35(9), 991-1011

The molecular roles of manyRNA-binding proteins in bacterial post-transcriptional gene regulation are not well understood. Approaches combiningin vivo UVcrosslinking withRNAdeep sequencing (CLIP-seq) have begun to revolutionize the transcriptome-wide mapping of eukaryoticRNA-binding protein target sites. We have appliedCLIP-seq to chart the target landscape of two major bacterial post-transcriptional regulators, Hfq and CsrA, in the model pathogenSalmonellaTyphimurium. By detecting binding sites at single-nucleotide resolution, we identifyRNApreferences and structural constraints of Hfq and CsrA during their interactions with hundreds of cellular transcripts. This reveals 3'-located Rho-independent terminators as a universal motif involved in Hfq-RNAinteractions. Additionally, Hfq preferentially binds 5' tosRNA-target sites inmRNAs, and 3' to seed sequences insRNAs, reflecting a simple logic in how Hfq facilitatessRNA-mRNAinteractions. Importantly, global knowledge of Hfq sites significantly improvessRNA-target predictions. CsrA bindsAUGGAsequences in apical loops and targets manySalmonellavirulencemRNAs. Overall, our genericCLIP-seq approach will bring new insights into post-transcriptional gene regulation byRNA-binding proteins in diverse bacterial species.

Available:
pdf (1339 KB)   doi:10.15252/embj.201593360   pmid:27044921   BibTeX Entry ( Holmqvist_Wright_Li-Globa_RNA_recog-2016 )

antaRNA - Multi-objective inverse folding of pseudoknot RNA using ant-colony optimization

R. Kleinkauf, T. Houwaart, R. Backofen, M. Mann

In: BMC Bioinformatics, 2015, 16(1), 1-7

Background: Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA's 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incorporating pseudoknot structures into the design. Hairpin-(H)- and kissing hairpin-(K)-type pseudoknots cover a wide range of biologically functional pseudoknots and can be represented on a secondary structure level. Results: The RNA inverse folding program antaRNA, which takes secondary structure, target GC-content and sequence constraints as input, is extended to provide solutions for such H- and K-type pseudoknotted secondary structure constraint. We demonstrate the easy and flexible interchangeability of modules within the antaRNA framework by incorporating pKiss as structure prediction tool capable of predicting the mentioned pseudoknot types. The performance of the approach is demonstrated on a subset of the Pseudobase++ dataset. Conclusions: This new service is available via a standalone version and is also part of the Freiburg RNA Tools webservice. Furthermore, antaRNA is available in Galaxy and is part of the RNA-workbench Docker image.

Available:
pdf (761 KB)   doi:10.1186/s12859-015-0815-6   BibTeX Entry ( Kleinkauf-web-2015 )

SimiRa: A tool to identify coregulation between microRNAs and RNA-binding proteins

Martin Preusse, Carsten Marr, Sita Saunders, Daniel Maticzka, Heiko Lickert, Rolf Backofen, Fabian Theis

In: RNA Biology, 2015, 12(9), 998-1009

microRNAs and microRNA-independent RNA-binding proteins are 2 classes of post-transcriptional regulators that have been shown to cooperate in gene-expression regulation. We compared the genome-wide target sets of microRNAs and RBPs identified by recent CLIP-Seq technologies, finding that RBPs have distinct target sets and favor gene interaction network hubs. To identify microRNAs and RBPs with a similar functional context, we developed simiRa, a tool that compares enriched functional categories such as pathways and GO terms. We applied simiRa to the known functional cooperation between Pumilio family proteins and miR-221/222 in the regulation of tumor supressor gene p27 and show that the cooperation is reflected by similar enriched categories but not by target genes. SimiRa also predicts possible cooperation of microRNAs and RBPs beyond direct interaction on the target mRNA for the nuclear RBP TAF15. To further facilitate research into cooperation of microRNAs and RBPs, we made simiRa available as a web tool that displays the functional neighborhood and similarity of microRNAs and RBPs: http://vsicb-simira.helmholtz-muenchen.de.

Available:
pdf (930 KB)   doi:10.1080/15476286.2015.1068496   pmid:26383775   BibTeX Entry ( Preusse_Marr_Saunders-SimiR_tool_ident_coreg-RNABiol2015 )

An updated evolutionary classification of CRISPR-Cas systems

Kira S. Makarova, Yuri I. Wolf, Omer S. Alkhnbashi, Fabrizio Costa, Shiraz A. Shah, Sita J. Saunders, Rodolphe Barrangou, Stan J. J. Brouns, Emmanuelle Charpentier, Daniel H. Haft, Philippe Horvath, Sylvain Moineau, Francisco J. M. Mojica, Rebecca M. Terns, Michael P. Terns, Malcolm F. White, Alexander F. Yakunin, Roger A. Garrett, John van der Oost, Rolf Backofen, Eugene V. Koonin

In: Nat Rev Microbiol, 2015, 13(11), 722-736

The evolution of CRISPR-cas loci, which encode adaptive immune systems in archaea and bacteria, involves rapid changes, in particular numerous rearrangements of the locus architecture and horizontal transfer of complete loci or individual modules. These dynamics complicate straightforward phylogenetic classification, but here we present an approach combining the analysis of signature protein families and features of the architecture of cas loci that unambiguously partitions most CRISPR-cas loci into distinct classes, types and subtypes. The new classification retains the overall structure of the previous version but is expanded to now encompass two classes, five types and 16 subtypes. The relative stability of the classification suggests that the most prevalent variants of CRISPR-Cas systems are already known. However, the existence of rare, currently unclassifiable variants implies that additional types and subtypes remain to be characterized.

Available:
pdf (531 KB)   doi:10.1038/nrmicro3569   pmid:26411297   BibTeX Entry ( Makarova_Wolf_Alkhnbashi-updat_evolu_class-2015 )

RC3H1 post-transcriptionally regulates A20 mRNA and modulates the activity of the IKK/NF-kappaB pathway

Yasuhiro Murakawa, Michael Hinz, Janina Mothes, Anja Schuetz, Michael Uhl, Emanuel Wyler, Tomoharu Yasuda, Guido Mastrobuoni, Caroline C. Friedel, Lars Dolken, Stefan Kempa, Marc Schmidt-Supprian, Nils Bluthgen, Rolf Backofen, Udo Heinemann, Jana Wolf, Claus Scheidereit, Markus Landthaler

In: Nat Commun, 2015, 6, 7367

The RNA-binding protein RC3H1 (also known as ROQUIN) promotes TNFalpha mRNA decay via a 3'UTR constitutive decay element (CDE). Here we applied PAR-CLIP to human RC3H1 to identify approximately 3,800 mRNA targets with >16,000 binding sites. A large number of sites are distinct from the consensus CDE and revealed a structure-sequence motif with U-rich sequences embedded in hairpins. RC3H1 binds preferentially short-lived and DNA damage-induced mRNAs, indicating a role of this RNA-binding protein in the post-transcriptional regulation of the DNA damage response. Intriguingly, RC3H1 affects expression of the NF-kappaB pathway regulators such as IkappaBalpha and A20. RC3H1 uses ROQ and Zn-finger domains to contact a binding site in the A20 3'UTR, demonstrating a not yet recognized mode of RC3H1 binding. Knockdown of RC3H1 resulted in increased A20 protein expression, thereby interfering with IkappaB kinase and NF-kappaB activities, demonstrating that RC3H1 can modulate the activity of the IKK/NF-kappaB pathway.

Available:
pdf (3294 KB)   doi:10.1038/ncomms8367   pmid:26170170   BibTeX Entry ( Murakawa_Hinz_Mothes-post_regul_mRNA-2015 )

Deciphering the Epigenetic Code of Cardiac Myocyte Transcription

Sebastian Preissl, Martin Schwaderer, Alexandra Raulf, Michael Hesse, Bjorn A. Gr\"uning, Claudia Kobele, Rolf Backofen, Bernd K. Fleischmann, Lutz Hein, Ralf Gilsbach

In: Circ Res, 2015, 117, 413-423

RATIONALE: Epigenetic mechanisms are crucial for cell identity and transcriptional control. The heart consists of different cell types including cardiac myocytes, endothelial cells, fibroblasts and others. Therefore, cell type-specific analysis is needed to gain mechanistic insight into the regulation of gene expression in cardiac myocytes. While cytosolic mRNA represents steady-state levels, nuclear mRNA more closely reflects transcriptional activity. To unravel epigenetic mechanisms of transcriptional control, cell type-specific analysis of nuclear mRNA and epigenetic modifications is crucial. OBJECTIVE: The aim was to purify cardiac myocyte nuclei from hearts of different species by magnetic- or fluorescent-assisted sorting and to determine the nuclear and cellular RNA expression profiles and epigenetic marks in a cardiac myocyte-specific manner. METHODS AND RESULTS: Frozen cardiac tissue samples were used to isolate cardiac myocyte nuclei. High sorting purity was confirmed for cardiac myocyte nuclei isolated from mice, rats and humans. Deep sequencing of nuclear RNA revealed a major fraction of nascent, unspliced RNA in contrast to results obtained from purified cardiac myocytes. Cardiac myocyte nuclear and cellular RNA expression profiles showed differences especially for metabolic genes. Genome-wide maps of the transcriptional elongation mark H3K36me3 were generated by chromatin-immunoprecipitation. Transcriptome and epigenetic data confirmed the high degree of cardiac myocyte-specificity of our protocol. An integrative analysis of nuclear mRNA and histone mark occurrence indicated a major impact of the chromatin state on transcriptional activity in cardiac myocytes. CONCLUSIONS: This study establishes cardiac myocyte-specific sorting of nuclei as a universal method to investigate epigenetic and transcriptional processes in cardiac myocytes of different origins. These data sets provide novel insight into cardiac myocyte transcription.

Available:
pdf (2167 KB)   doi:10.1161/CIRCRESAHA.115.306337   pmid:26105955   BibTeX Entry ( Preissl_Schwaderer_Raulf-Decip_the_Epige-2015 )

antaRNA - Ant Colony Based RNA Sequence Design

R. Kleinkauf, M. Mann, R. Backofen

In: Bioinformatics, 2015, 31(19), 3114-3121

Motivation: RNA sequence design is studied at least as long as the classical folding problem. While for the latter the functional fold of an RNA molecule is to be found, inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology, reliable RNA sequence design becomes a crucial step to generate novel biochemical components. Results: In this article, the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GCcontent distribution, specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets. Availability: http://www.bioinf.uni-freiburg.de/Software/antaRNA

Available:
pdf (553 KB)   doi:10.1093/bioinformatics/btv319   BibTeX Entry ( Kleinkauf2015 )

Comparative analysis of the antioxidant properties of Icelandic and Hawaiian lichens

Kehau Hagiwara, Patrick R. Wright, Nicole K. Tabandera, Dovi Kelman, Rolf Backofen, Sesselja Omarsdottir, Anthony D. Wright

In: Environ Microbiol, 2015, 18(8), 2319-2325

Antioxidant activity of symbiotic organisms known as lichens is an intriguing field of research because of its strong contribution to their ability to withstand extremes of physical and biological stress (e.g. desiccation, temperature, UV radiation and microbial infection). We present a comparative study on the antioxidant activities of 76 Icelandic and 41 Hawaiian lichen samples assessed employing the DPPH- and FRAP-based antioxidant assays. Utilizing this unprecedented sample size, we show that while highest individual sample activity is present in the Icelandic dataset, the overall antioxidant activity is higher for lichens found in Hawaii. Furthermore, we report that lichens from the genus Peltigera that have been described as strong antioxidant producers in studies on Chinese, Russian and Turkish lichens also show high antioxidant activities in both Icelandic and Hawaiian lichen samples. Finally, we show that opportunistic sampling of lichens in both Iceland and Hawaii will yield high numbers of lichen species that exclusively include green algae as photobiont.

Available:
pdf (1184 KB)   doi:10.1111/1462-2920.12850   pmid:25808912   BibTeX Entry ( Hagiwara_Wright_Tabandera-Compa_analy_the-2015 )

An active immune defense with a minimal CRISPR (clustered regularly interspaced short palindromic repeats) RNA and without the Cas6 protein

Lisa-Katharina Maier, Aris-Edda Stachler, Sita J. Saunders, Rolf Backofen, Anita Marchfelder

In: Journal of Biological Chemistry, 2015, 290(7), 4192-201

The prokaryotic immune system CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated) is a defense system that protects prokaryotes against foreign DNA. The short CRISPR RNAs (crRNAs) are central components of this immune system. In CRISPR-Cas systems type I and III, crRNAs are generated by the endonuclease Cas6. We developed a Cas6b-independent crRNA maturation pathway for the Haloferax type I-B system in vivo that expresses a functional crRNA, which we termed independently generated crRNA (icrRNA). The icrRNA is effective in triggering degradation of an invader plasmid carrying the matching protospacer sequence. The Cas6b-independent maturation of the icrRNA allowed mutation of the repeat sequence without interfering with signals important for Cas6b processing. We generated 23 variants of the icrRNA and analyzed them for activity in the interference reaction. icrRNAs with deletions or mutations of the 3' handle are still active in triggering an interference reaction. The complete 3' handle could be removed without loss of activity. However, manipulations of the 5' handle mostly led to loss of interference activity. Furthermore, we could show that in the presence of an icrRNA a strain without Cas6b (Deltacas6b) is still active in interference.

Available:
pdf (2076 KB)   doi:10.1074/jbc.M114.617506   pmid:25512373   BibTeX Entry ( Maier_Stachler_Saunders-activ_immun_defen-JBC2015 )

Cell type specific gene expression analysis of prostate needle biopsies resolves tumor tissue heterogeneity

Malte Kronig, Max Walter, Vanessa Drendel, Martin Werner, Cordula A. Jilg, Andreas S. Richter, Rolf Backofen, David McGarry, Marie Follo, Wolfgang Schultze-Seemann, Roland Schule

In: Oncotarget, 2015, 6(2), 1302-14

A lack of cell surface markers for the specific identification, isolation and subsequent analysis of living prostate tumor cells hampers progress in the field. Specific characterization of tumor cells and their microenvironment in a multi-parameter molecular assay could significantly improve prognostic accuracy for the heterogeneous prostate tumor tissue. Novel functionalized gold-nano particles allow fluorescence-based detection of absolute mRNA expression levels in living cells by fluorescent activated flow cytometry (FACS). We use of this technique to separate prostate tumor and benign cells in human prostate needle biopsies based on the expression levels of the tumor marker alpha-methylacyl-CoA racemase (AMACR). We combined RNA and protein detection of living cells by FACS to gate for epithelial cell adhesion molecule (EPCAM) positive tumor and benign cells, EPCAM/CD45 double negative mesenchymal cells and CD45 positive infiltrating lymphocytes. EPCAM positive epithelial cells were further sub-gated into AMACR high and low expressing cells. Two hundred cells from each population and several biopsies from the same patient were analyzed using a multiplexed gene expression profile to generate a cell type resolved profile of the specimen. This technique provides the basis for the clinical evaluation of cell type resolved gene expression profiles as pre-therapeutic prognostic markers for prostate cancer.

Available:
pdf (3675 KB)   pmid:25514598   BibTeX Entry ( Kronig_Walter_Drendel-Cell_type_speci-2015 )

The role of Cas8 in type I CRISPR interference

Simon D. B. Cass, Karina A. Haas, Britta Stoll, Omer Alkhnbashi, Kundan Sharma, Henning Urlaub, Rolf Backofen, Anita Marchfelder, Edward L. Bolt

In: Biosci Rep, 2015, 35(4), e00197

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) systems provide bacteria and archaea with adaptive immunity to repel invasive genetic elements. Type I systems use "Cascade" ribonucleoprotein complexes to target invader DNA, by base pairing CRISPR RNA (crRNA) to protospacers. Cascade identifies PAMs (Protospacer Adjacent Motifs) on invader DNA, triggering R-loop formation and subsequent DNA degradation by Cas3. Cas8 is a candidate PAM recognition factor in some Cascades. We analysed Cas8 homologues from type IB CRISPR systems in archaea Haloferax volcanii (Hvo) and Methanothermobacter thermautotrophicus (Mth). Cas8 was essential for CRISPR interference in Hvo, and purified Mth Cas8 protein responded to PAM sequence when binding to nucleic acids. Cas8 interacted physically with Cas5-Cas7-crRNA complex, stimulating binding to PAM containing substrates. Mutation of conserved Cas8 amino acid residues abolished interference in vivo, and altered catalytic activity of Cas8 protein in vitro. This is experimental evidence that Cas8 is important for targeting Cascade to invader DNA.

Available:
pdf (815 KB)   doi:10.1042/BSR20150043   pmid:25940458   BibTeX Entry ( Cass_Haas_Stoll-The_role_Cas-2015 )

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Sebastian Will, Christina Otto, Milad Miladi, Mathias M\"ohl, Rolf Backofen

In: Bioinformatics, 2015, 31(15), 2489-2496

MOTIVATION: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). RESULTS: Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. AVAILABILITY AND IMPLEMENTATION: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. CONTACT: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online.

Available:
pdf (445 KB)   doi:10.1093/bioinformatics/btv185   pmid:25838465   BibTeX Entry ( Will_Otto_Miladi-SPARS_quadr_time-2015 )

Biological and bioinformatical approaches to study crosstalk of long-non-coding RNAs and chromatin-modifying proteins

Rolf Backofen, Tanja Vogel

In: Cell Tissue Res, 2014, 356(3), 507-26

Long-non-coding RNA (lncRNA) regulates gene expression through transcriptional and epigenetic regulation as well as alternative splicing in the nucleus. In addition, regulation is achieved at the levels of mRNA translation, storage and degradation in the cytoplasm. During recent years, several studies have described the interaction of lncRNAs with enzymes that confer so-called epigenetic modifications, such as DNA methylation, histone modifications and chromatin structure or remodelling. LncRNA interaction with chromatin-modifying enzymes (CME) is an emerging field that confers another layer of complexity in transcriptional regulation. Given that CME-lncRNA interactions have been identified in many biological processes, ranging from development to disease, comprehensive understanding of underlying mechanisms is important to inspire basic and translational research in the future. In this review, we highlight recent findings to extend our understanding about the functional interdependencies between lncRNAs and CMEs that activate or repress gene expression. We focus on recent highlights of molecular and functional roles for CME-lncRNAs and provide an interdisciplinary overview of recent technical and methodological developments that have improved biological and bioinformatical approaches for detection and functional studies of CME-lncRNA interaction.

Available:
pdf (3074 KB)   doi:10.1007/s00441-014-1885-x   pmid:24820400   BibTeX Entry ( Backofen_Vogel-Biolo_and_bioin-2014 )

ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs

Christina Otto, Mathias Mohl, Steffen Heyne, Mika Amit, Gad M. Landau, Rolf Backofen, Sebastian Will

In: BMC Bioinformatics, 2014, 15(1), 6602

BackgroundIdentifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable.ResultsThe novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known inverted question marksimultaneous alignment and folding inverted question mark of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P inverted question marks sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences ( inverted question mark400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P.Conclusions ExpaRNA-P inverted question marks novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications.

Available:
pdf (1879 KB)   doi:10.1186/s12859-014-0404-0   pmid:25551362   BibTeX Entry ( Otto_Mohl_Heyne-ExpaR_simul_exact-2014 )

An Active Immune Defence with a Minimal CRISPR (clustered regularly interspaced short palindromic repeats) RNA and Without the Cas6 Protein

Lisa-Katharina Maier, Aris-Edda Stachler, Sita J. Saunders, Rolf Backofen, Anita Marchfelder

In: Journal of Biological Chemistry, 2014, 290(7), 4192-4201

The prokaryotic immune system CRISPR-Cas1 is a defence system that protects prokaryotes against foreign DNA. The short CRISPR RNAs (crRNAs) are central components of this immune system. In CRISPR-Cas systems type I and III crRNAs are generated by the endonuclease Cas6. We developed a Cas6b2-independent crRNA maturation pathway for the Haloferax type I-B system in vivo, that expresses a functional crRNA that we termed independently generated crRNA (icrRNA). The icrRNA is effective in triggering degradation of an invader plasmid carrying the matching protospacer sequence. The Cas6b-independent maturation of the icrRNA allowed mutation of the repeat sequence without interfering with signals important for Cas6b processing. We generated 23 variants of the icrRNA and analysed them for activity in the interference reaction. icrRNAs with deletions or mutations of the 3' handle are still active in triggering a interference reaction. The complete 3' handle could be removed without loss of activity. However manipulations of the 5' handle mostly led to loss of interference activity. Furthermore we could show that in the presence of an icrRNA a strain without Cas6b (cas6b) is still active in interference.

Available:
pdf (2507 KB)   doi:10.1074/jbc.M114.617506   pmid:25512373   BibTeX Entry ( Maier_Stachler_Saunders-Activ_Immun_Defen-JBC2014 )

Graph-distance distribution of the Boltzmann ensemble of RNA secondary structures

Jing Qin, Markus Fricke, Manja Marz, Peter F. Stadler, Rolf Backofen

In: Algorithms Mol Biol, 2014, 9, 19

BACKGROUND: Large RNA molecules are often composed of multiple functional domains whose spatial arrangement strongly influences their function. Pre-mRNA splicing, for instance, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium harbors useful information on the shape of the molecule that in turn can give insights into the interplay of its functional domains. RESULT: Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between a fixed pair of nucleotides can be computed in polynomial time by means of dynamic programming. While a naive implementation would yield recursions with a very high time complexity of O(n (6) D (5)) for sequence length n and D distinct distance values, it is possible to reduce this to O(n (4)) for practical applications in which predominantly small distances are of of interest. Further reductions, however, seem to be difficult. Therefore, we introduced sampling approaches that are much easier to implement. They are also theoretically favorable for several real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules. CONCLUSIONS: The graph-distance distribution can be computed using a dynamic programming approach. Although a crude approximation of reality, our initial results indicate that the graph-distance can be related to the smFRET data. The additional file and the software of our paper are available from http://www.rna.uni-jena.de/RNAgraphdist.html.

Available:
pdf (1447 KB)   doi:10.1186/1748-7188-9-19   pmid:25285153   BibTeX Entry ( Qin_Fricke_Marz-Graph_distr_the-2014 )

Dynamic DNA methylation orchestrates cardiomyocyte development, maturation and disease

Ralf Gilsbach, Sebastian Preissl, Bjorn A. Gruning, Tilman Schnick, Lukas Burger, Vladimir Benes, Andreas Wurch, Ulrike Bonisch, Stefan Gunther, Rolf Backofen, Bernd K. Fleischmann, Dirk Schubeler, Lutz Hein

In: Nat Commun, 2014, 5, 5288

The heart is a highly specialized organ with essential function for the organism throughout life. The significance of DNA methylation in shaping the phenotype of the heart remains only partially known. Here we generate and analyse DNA methylomes from highly purified cardiomyocytes of neonatal, adult healthy and adult failing hearts. We identify large genomic regions that are differentially methylated during cardiomyocyte development and maturation. Demethylation of cardiomyocyte gene bodies correlates strongly with increased gene expression. Silencing of demethylated genes is characterized by the polycomb mark H3K27me3 or by DNA methylation. De novo methylation by DNA methyltransferases 3A/B causes repression of fetal cardiac genes, including essential components of the cardiac sarcomere. Failing cardiomyocytes partially resemble neonatal methylation patterns. This study establishes DNA methylation as a highly dynamic process during postnatal growth of cardiomyocytes and their adaptation to pathological stress in a process tightly linked to gene regulation and activity.

Available:
pdf (3311 KB)   doi:10.1038/ncomms6288   pmid:25335909   BibTeX Entry ( Gilsbach_Preissl_Gruning-Dynam_DNA_methy-2014 )

Autosomal dominant immune dysregulation syndrome in humans with CTLA4 mutations

Desiree Schubert, Claudia Bode, Rupert Kenefeck, Tie Zheng Hou, James B. Wing, Alan Kennedy, Alla Bulashevska, Britt-Sabina Petersen, Alejandro A. Schaffer, Bjorn A. Gruning, Susanne Unger, Natalie Frede, Ulrich Baumann, Torsten Witte, Reinhold E. Schmidt, Gregor Dueckers, Tim Niehues, Suranjith Seneviratne, Maria Kanariou, Carsten Speckmann, Stephan Ehl, Anne Rensing-Ehl, Klaus Warnatz, Mirzokhid Rakhmanov, Robert Thimme, Peter Hasselblatt, Florian Emmerich, Toni Cathomen, Rolf Backofen, Paul Fisch, Maximilian Seidl, Annette May, Annette Schmitt-Graeff, Shinji Ikemizu, Ulrich Salzer, Andre Franke, Shimon Sakaguchi, Lucy S. K. Walker, David M. Sansom, Bodo Grimbacher

In: Nat Med, 2014, 20(12), 1410-1416

The protein cytotoxic T lymphocyte antigen-4 (CTLA-4) is an essential negative regulator of immune responses, and its loss causes fatal autoimmunity in mice. We studied a large family in which five individuals presented with a complex, autosomal dominant immune dysregulation syndrome characterized by hypogammaglobulinemia, recurrent infections and multiple autoimmune clinical features. We identified a heterozygous nonsense mutation in exon 1 of CTLA4. Screening of 71 unrelated patients with comparable clinical phenotypes identified five additional families (nine individuals) with previously undescribed splice site and missense mutations in CTLA4. Clinical penetrance was incomplete (eight adults of a total of 19 genetically proven CTLA4 mutation carriers were considered unaffected). However, CTLA-4 protein expression was decreased in regulatory T cells (Treg cells) in both patients and carriers with CTLA4 mutations. Whereas Treg cells were generally present at elevated numbers in these individuals, their suppressive function, CTLA-4 ligand binding and transendocytosis of CD80 were impaired. Mutations in CTLA4 were also associated with decreased circulating B cell numbers. Taken together, mutations in CTLA4 resulting in CTLA-4 haploinsufficiency or impaired ligand binding result in disrupted T and B cell homeostasis and a complex immune dysregulation syndrome.

Available:
pdf (4742 KB)   doi:10.1038/nm.3746   pmid:25329329   BibTeX Entry ( Schubert_Bode_Kenefeck-Autos_domin_immun-2014 )

Atom Mapping with Constraint Programming

Martin Mann, Feras Nahar, Norah Schnorr, Rolf Backofen, Peter F. Stadler, Christoph Flamm

In: BMC Algorithms for Molecular Biology, 2014, 9(1), 23

Chemical reactions are rearrangements of chemical bonds. Each atom in an educt molecule thus appears again in a specific position of one of the reaction products. This bijection between educt and product atoms is not reported by chemical reaction databases, however, so that the 'Atom Mapping Problem' of finding this bijection is left as an important computational task for many practical applications in computational chemistry and systems biology. Elementary chemical reactions feature a cyclic imaginary transition state (ITS) that imposes additional restrictions on the bijection between educt and product atoms that are not taken into account by previous approaches. We demonstrate that Constraint Programming is well-suited to solving the Atom Mapping Problem in this setting. The performance of our approach is evaluated for a manually curated subset of chemical reactions from the KEGG database featuring various ITS cycle layouts and reaction mechanisms.

Publication note:
In Thematic series on Constraints and Bioinformatics

Available:
supplement.pdf (95 KB)   pdf (487 KB)   doi:10.1186/s13015-014-0023-3   BibTeX Entry ( Mann-CAM-14 )

Exact methods for lattice protein models

Martin Mann, Rolf Backofen

In: Bio-Algorithms and Med-Systems, 2014, 10(4), 213-225

Lattice protein models are well studied abstractions of globular proteins. By discretizing the structure space and simplifying the energy model over regular proteins, they enable detailed studies of protein structure formation and evolution. But even in the simplest lattice protein models, the prediction of optimal structures is computationally hard. Therefore, often heuristic approaches are applied to find such conformations. Commonly, heuristic methods find only locally optimal solutions. Nevertheless, there exist methods that guarantee to predict globally optimal structures. Currently only one such exact approach is publicly available, namely the Constraint-based Protein Structure Prediction (CPSP) method and variants. Here, we review exact approaches and derived methods. We discuss fundamental concepts like hydrophobic core construction and their use in optimal structure prediction as well as possible applications like combinations of different energy models.

Available:
pdf (2067 KB)   doi:10.1515/bams-2014-0014   BibTeX Entry ( Mann-Backofen_2014 )

CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci

Omer S. Alkhnbashi, Fabrizio Costa, Shiraz A. Shah, Roger A. Garrett, Sita J. Saunders, Rolf Backofen

In: Bioinformatics, 2014, 30(17), i489-i496

MOTIVATION: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. RESULTS: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. AVAILABILITY: CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. CONTACT: backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Publication note:
In the proceedings of the 13th European Conference on Computational Biology (ECCB) 2014.

Available:
pdf (550 KB)   doi:10.1093/bioinformatics/btu459   pmid:25161238   BibTeX Entry ( Alkhnbashi_Costa_Shah-CRISP_predi_repea-2014 )

BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles

Pavankumar Videm, Dominic Rose, Fabrizio Costa, Rolf Backofen

In: Bioinformatics, 2014, 30(12), i274-i282

SUMMARY: Non-coding RNAs (ncRNAs) play a vital role in many cellular processes such as RNA splicing, translation, gene regulation. However the vast majority of ncRNAs still have no functional annotation. One prominent approach for putative function assignment is clustering of transcripts according to sequence and secondary structure. However sequence information is changed by post-transcriptional modifications, and secondary structure is only a proxy for the true 3D conformation of the RNA polymer. A different type of information that does not suffer from these issues and that can be used for the detection of RNA classes, is the pattern of processing and its traces in small RNA-seq reads data. Here we introduce BlockClust, an efficient approach to detect transcripts with similar processing patterns. We propose a novel way to encode expression profiles in compact discrete structures, which can then be processed using fast graph-kernel techniques. We perform both unsupervised clustering and develop family specific discriminative models; finally we show how the proposed approach is scalable, accurate and robust across different organisms, tissues and cell lines. Availability: The whole BlockClust galaxy workflow including all tool dependencies is available at http://toolshed.g2.bx.psu.edu/view/rnateam/blockclust_workflow. CONTACT: backofen@informatik.uni-freiburg.de; costa@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online.

Available:
pdf (1006 KB)   doi:10.1093/bioinformatics/btu270   pmid:24931994   BibTeX Entry ( Videm_Rose_Costa-Block_effic_clust-2014 )

Two separate modules of the conserved regulatory RNA AbcR1 address multiple target mRNAs in and outside of the translation initiation region

Aaron Overloeper, Alexander Kraus, Rosemarie Gurski, Patrick R. Wright, Jens Georg, Wolfgang R. Hess, Franz Narberhaus

In: RNA Biol, 2014, 11(5), 624-40

The small RNA AbcR1 regulates the expression of ABC transporters in the plant pathogen Agrobacterium tumefaciens, the plant symbiont Sinorhizobium meliloti, and the human pathogen Brucella abortus. A combination of proteomic and bioinformatic approaches suggested dozens of AbcR1 targets in A. tumefaciens. Several of these newly discovered targets are involved in the uptake of amino acids, their derivatives, and sugars. Among the latter is the periplasmic sugar-binding protein ChvE, a component of the virulence signal transduction system. We examined 16 targets and their interaction with AbcR1 in close detail. In addition to the previously described mRNA interaction site of AbcR1 (M1), the CopraRNA program predicted a second functional module (M2) as target-binding site. Both M1 and M2 contain single-stranded anti-SD motifs. Using mutated AbcR1 variants, we systematically tested by band shift experiments, which sRNA region is responsible for mRNA binding and gene regulation. On the target site, we find that AbcR1 interacts with some mRNAs in the translation initiation region and with others far into their coding sequence. Our data show that AbcR1 is a versatile master regulator of nutrient uptake systems in A. tumefaciens and related bacteria.

Available:
pdf (3941 KB)   doi:10.4161/rna.29145   pmid:24921646   BibTeX Entry ( Overloper_Kraus_Gurski-Two_separ_modul-2014 )

Local Exact Pattern Matching for Non-Fixed RNA Structures.

Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Möhl, Christina Otto, Sebastian Will

In: IEEE/ACM Trans. Comput. Biology Bioinform., 2014, 11(1), 219-230

Detecting local common sequence-structure regions of RNAs is a biologically important problem. Detecting such regions allows biologists to identify functionally relevant similarities between the inspected molecules. We developed dynamic programming algorithms for finding common structure-sequence patterns between two RNAs. The RNAs are given by their sequence and a set of potential base pairs with associated probabilities. In contrast to prior work on local pattern matching of RNAs, we support the breaking of arcs. This allows us to add flexibility over matching only fixed structures; potentially matching only a similar subset of specified base pairs. We present an O(n3) algorithm for local exact pattern matching between two nested RNAs, and an O(n3 log n) algorithm for one nested RNA and one bounded-unlimited RNA. In addition, an algorithm for approximate pattern matching is introduced that for two given nested RNAs and a number k, finds the maximal local pattern matching score between the two RNAs with at most k mismatches in O(n3k2) time. Finally, we present an O(n3) algorithm for finding the most similar subforest between two nested RNAs.

Available:
pdf (784 KB)   doi:10.1109/TCBB.2013.2297113   BibTeX Entry ( amit14:_local_exact_patter_match_non_fixed_struc )

Simultaneous Alignment and Folding of Protein Sequences

Jerome Waldispuhl, Charles W. O'Donnell, Sebastian Will, Srinivas Devadas, Rolf Backofen, Bonnie Berger

In: J Comput Biol, 2014, 21(7), 477-491

Abstract Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane beta-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments, partiFold-Align significantly outperforms state-of-the-art pairwise and multiple sequence alignment tools in the most difficult low-sequence homology case. It also improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families (partiFold-Align is available at http://partifold.csail.mit.edu/ ).

Available:
pdf (579 KB)   doi:10.1089/cmb.2013.0163   pmid:24766258   BibTeX Entry ( Waldispuhl_ODonnell_Will-Simul_Align_and-JCB2014 )

CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains

Patrick R. Wright, Jens Georg, Martin Mann, Dragos A. Sorescu, Andreas S. Richter, Steffen Lott, Robert Kleinkauf, Wolfgang R. Hess, Rolf Backofen

In: Nucleic Acids Res, 2014, 42(Web Server issue), W119-23

CopraRNA (Comparative prediction algorithm for small RNA targets) is the most recent asset to the Freiburg RNA Tools webserver. It incorporates and extends the functionality of the existing tool IntaRNA (Interacting RNAs) in order to predict targets, interaction domains and consequently the regulatory networks of bacterial small RNA molecules. The CopraRNA prediction results are accompanied by extensive postprocessing methods such as functional enrichment analysis and visualization of interacting regions. Here, we introduce the functionality of the CopraRNA and IntaRNA webservers and give detailed explanations on their postprocessing functionalities. Both tools are freely accessible at http://rna.informatik.uni-freiburg.de.

Publication note:
PRW, JG and MM contributed equally to this work

Available:
pdf (668 KB)   doi:10.1093/nar/gku359   pmid:24838564   BibTeX Entry ( Wright_Georg_Mann-Copra_and_IntaR-NAR2014 )

Bioinformatics of prokaryotic RNAs

Rolf Backofen, Fabian Amman, Fabrizio Costa, Sven Findeiss, Andreas S. Richter, Peter F. Stadler

In: RNA Biol, 2014, 11(5)

The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.

Available:
pdf (1834 KB)   pmid:24755880   BibTeX Entry ( Backofen_Amman_Costa-Bioin_proka_RNAs-2014 )

MOF-associated complexes ensure stem cell identity and Xist repression

Tomasz Chelmicki, Friederike Dundar, Matthew Turley, Tasneem Khanam, Tugce Aktas, Fidel Ramirez, Anne-Valerie Gendrel, Patrick R. Wright, Pavankumar Videm, Rolf Backofen, Edith Heard, Thomas Manke, Asifa Akhtar

In: Elife, 2014, 3, e02024

Histone acetyl transferases (HATs) play distinct roles in many cellular processes and are frequently misregulated in cancers. Here, we study the regulatory potential of MYST1-(MOF)-containing MSL and NSL complexes in mouse embryonic stem cells (ESCs) and neuronal progenitors. We find that both complexes influence transcription by targeting promoters as well as TSS-distal enhancers. In contrast to flies, the MSL complex is not exclusively enriched on the X chromosome yet it is crucial for mammalian X chromosome regulation as it specifically regulates Tsix, the major repressor of Xist lncRNA. MSL depletion leads to decreased Tsix expression, reduced REX1 recruitment, and consequently, enhanced accumulation of Xist and variable numbers of inactivated X chromosomes during early differentiation. The NSL complex provides additional, Tsix-independent repression of Xist by maintaining pluripotency. MSL and NSL complexes therefore act synergistically by using distinct pathways to ensure a fail-safe mechanism for the repression of X inactivation in ESCs.

Available:
pdf (2263 KB)   doi:10.7554/eLife.02024   pmid:24842875   BibTeX Entry ( Chelmicki_Dundar_Turley-MOF_compl_ensur-2014 )

Lineage-specific splicing of a brain-enriched alternative exon promotes glioblastoma progression

Roberto Ferrarese, Griffith R. 4th Harsh, Ajay K. Yadav, Eva Bug, Daniel Maticzka, Wilfried Reichardt, Stephen M. Dombrowski, Tyler E. Miller, Anie P. Masilamani, Fangping Dai, Hyunsoo Kim, Michael Hadler, Denise M. Scholtens, Irene L. Y. Yu, Jurgen Beck, Vinodh Srinivasasainagendra, Fabrizio Costa, Nicoleta Baxan, Dietmar Pfeifer, Dominik V. Elverfeldt, Rolf Backofen, Astrid Weyerbrock, Christine W. Duarte, Xiaolin He, Marco Prinz, James P. Chandler, Hannes Vogel, Arnab Chakravarti, Jeremy N. Rich, Maria S. Carro, Markus Bredel

In: J Clin Invest, 2014, 124(7), 2861-2876

Tissue-specific alternative splicing is critical for the emergence of tissue identity during development, yet the role of this process in malignant transformation is undefined. Tissue-specific splicing involves evolutionarily conserved, alternative exons that represent only a minority of the total alternative exons identified. Many of these conserved exons have functional features that influence signaling pathways to profound biological effect. Here, we determined that lineage-specific splicing of a brain-enriched cassette exon in the membrane-binding tumor suppressor annexin A7 (ANXA7) diminishes endosomal targeting of the EGFR oncoprotein, consequently enhancing EGFR signaling during brain tumor progression. ANXA7 exon splicing was mediated by the ribonucleoprotein PTBP1, which is normally repressed during neuronal development. PTBP1 was highly expressed in glioblastomas due to loss of a brain-enriched microRNA (miR-124) and to PTBP1 amplification. The alternative ANXA7 splicing trait was present in precursor cells, suggesting that glioblastoma cells inherit the trait from a potential tumor-initiating ancestor and that these cells exploit this trait through accumulation of mutations that enhance EGFR signaling. Our data illustrate that lineage-specific splicing of a tissue-regulated alternative exon in a constituent of an oncogenic pathway eliminates tumor suppressor functions and promotes glioblastoma progression. This paradigm may offer a general model as to how tissue-specific regulatory mechanisms can reprogram normal developmental processes into oncogenic ones.

Available:
pdf (7740 KB)   doi:10.1172/JCI68836   pmid:24865424   BibTeX Entry ( Ferrarese_Harsh_GR_Yadav-Linea_splic_brain-2014 )

Activation of a GPCR leads to eIF4G phosphorylation at the 5' cap and to IRES-dependent translation

Kelly Leon, Thomas Boulo, Astrid Musnier, Julia Morales, Christophe Gauthier, Laurence Dupuy, Steffen Heyne, Rolf Backofen, Anne Poupon, Patrick Cormier, Eric Reiter, Pascale Crepieux

In: J Mol Endocrinol, 2014, 52(3), 373-82

The control of mRNA translation has been mainly explored in response to activated tyrosine kinase receptors. In contrast, mechanistic details on the translational machinery are far less available in the case of ligand-bound G protein-coupled receptors (GPCRs). In this study, using the FSH receptor (FSH-R) as a model receptor, we demonstrate that part of the translational regulations occurs by phosphorylation of the translation pre-initiation complex scaffold protein, eukaryotic initiation factor 4G (eIF4G), in HEK293 cells stably expressing the FSH-R. This phosphorylation event occurred when eIF4G was bound to the mRNA 5' cap, and probably involves mammalian target of rapamycin. This regulation might contribute to cap-dependent translation in response to FSH. The cap-binding protein eIF4E also had its phosphorylation level enhanced upon FSH stimulation. We also show that FSH-induced signaling not only led to cap-dependent translation but also to internal ribosome entry site (IRES)-dependent translation of some mRNA. These data add detailed information on the molecular bases underlying the regulation of selective mRNA translation by a GPCR, and a topological model recapitulating these mechanisms is proposed.

Available:
doi:10.1530/JME-14-0009   pmid:24711644   BibTeX Entry ( Leon_Boulo_Musnier-Activ_GPCR_leads-2014 )

MoDPepInt: an interactive web server for prediction of modular domain-peptide interactions

Kousik Kundu, Martin Mann, Fabrizio Costa, Rolf Backofen

In: Bioinformatics, 2014, 30(18), 2668-2669

SUMMARY:: MoDPepInt (Modular Domain Peptide Interaction) is a new easy-to-use web server for the prediction of binding partners for modular protein domains. Currently, we offer models for SH2, SH3 and PDZ domains via the tools SH2PepInt, SH3PepInt and PDZPepInt, respectively. More specifically, our server offers predictions for 51 SH2 human domains and 69 SH3 human domains via single domain models, and predictions for 226 PDZ domains across several species, via 43 multidomain models. All models are based on support vector machines with different kernel functions ranging from polynomial, to Gaussian, to advanced graph kernels. In this way, we model non-linear interactions between amino acid residues. Results were validated on manually curated datasets achieving competitive performance against various state-of-the-art approaches. Availability and implementation: The MoDPepInt server is available under the URL http://modpepint.informatik.uni-freiburg.de/ CONTACT: : backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online.

Available:
pdf (296 KB)   doi:10.1093/bioinformatics/btu350   pmid:24872426   BibTeX Entry ( Kundu_Mann_Costa-MoDPe_inter_web-2014 )

Memory efficient RNA energy landscape exploration

Martin Mann, Marcel Kucharik, Christoph Flamm, Michael T. Wolfinger

In: Bioinformatics, 2014, 30(18), 2584-2591

Energy landscapes provide a valuable means for studying the folding dynamics of short RNA molecules in detail by modeling all possible structures and their transitions. Higher abstraction levels based on a macro-state decomposition of the landscape enable the study of larger systems, however they are still restricted by huge memory requirements of exact approaches. We present a highly parallelizable local enumeration scheme that enables the computation of exact macro-state transition models with highly reduced memory requirements. The approach is evaluated on RNA secondary structure landscapes using a gradient basin definition for macro-states. Furthermore, we demonstrate the need for exact transition models by comparing two barrier-based appoaches and perform a detailed investigation of gradient basins in RNA energy landscapes. Source code is part of the C++ Energy Landscape Library available at http://www.bioinf.uni-freiburg.de/Software/.

Available:
pdf (1008 KB)   doi:10.1093/bioinformatics/btu337   arXiv:1404.0270   pmid:24833804   BibTeX Entry ( Mann_basin_14 )

Computational Prediction of RNA-RNA Interactions

Rolf Backofen

In: Methods Mol Biol, 2014, 1097, 417-35

We describe different tools and approaches for RNA-RNA interaction prediction. Recognition of ncRNA targets is predominantly governed by two principles, namely the stability of the duplex between the two interacting RNAs and the internal structure of both mRNA and ncRNA. Thus, approaches can be distinguished into different major categories depending on how they consider inter- and intramolecular structure. The first class completely neglects the internal structure and measures only the stability of the duplex. The second class of approaches abstracts from specific intramolecular structures and uses an ensemble-based approach to calculate the effect of internal structure on a putative binding site, thus measuring the accessibility of the binding sites.Since accessibility-based approaches can handle only one continuous interaction site, two addition types of approaches were introduced which predict a joint structure for the interacting RNAs. Since this problem is NP-complete, the approaches can handle only a restricted class of joint structures. The first are co-folding approaches, which predict a joint structure that is nested when the both sequences are concatenated. The last and most complex class of approaches impose only the restriction that they discard zipper-like structures. Finally, we will discuss the use of conservation information in RNA-target prediction.

Available:
doi:10.1007/978-1-62703-709-9_19   pmid:24639170   BibTeX Entry ( Backofen-Compu_Predi_RNA-2014 )

Cluster based prediction of PDZ-peptide interactions

Kousik Kundu, Rolf Backofen

In: BMC Genomics, 2014, 15(Suppl 1), S5

BACKGROUND: PDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling. Recently, few high-throughput techniques (e.g. protein microarray screen, phage display) have been applied to determine in-vitro binding specificity of PDZ domains. Currently, many computational methods are available to predict PDZ-peptide interactions but they often provide domain specific models and/or have a limited domain coverage. RESULTS: Here, we composed the largest set of PDZ domains derived from human, mouse, fly and worm proteomes and defined binding models for PDZ domain families to improve the domain coverage and prediction specificity. For that purpose, we first identified a novel set of 138 PDZ families, comprising of 548 PDZ domains from aforementioned organisms, based on efficient clustering according to their sequence identity. For 43 PDZ families, covering 226 PDZ domains with available interaction data, we built specialized models using a support vector machine approach. The advantage of family-wise models is that they can also be used to determine the binding specificity of a newly characterized PDZ domain with sufficient sequence identity to the known families. Since most current experimental approaches provide only positive data, we have to cope with the class imbalance problem. Thus, to enrich the negative class, we introduced a powerful semi-supervised technique to generate high confidence non-interaction data. We report competitive predictive performance with respect to state-of-the-art approaches. CONCLUSIONS: Our approach has several contributions. First, we show that domain coverage can be increased by applying accurate clustering technique. Second, we developed an approach based on a semi-supervised strategy to get high confidence negative data. Third, we allowed high order correlations between the amino acid positions in the binding peptides. Fourth, our method is general enough and will easily be applicable to other peptide recognition modules such as SH2 domains and finally, we performed a genome-wide prediction for 101 human and 102 mouse PDZ domains and uncovered novel interactions with biological relevance. We make all the predictive models and genome-wide predictions freely available to the scientific community.

Available:
pdf (1355 KB)   doi:10.1186/1471-2164-15-S1-S5   pmid:24564547   BibTeX Entry ( Kundu_Backofen-Clust_based_predi-2014 )

A complex of Cas proteins 5, 6, and 7 is required for the biogenesis and stability of crRNAs in Haloferax volcanii

Jutta Brendel, Britta Stoll, Sita J. Lange, Kundan Sharma, Christof Lenz, Aris-Edda Stachler, Lisa-Katharina Maier, Hagen Richter, Lisa Nickel, Ruth A. Schmitz, Lennart Randau, Thorsten Allers, Henning Urlaub, Rolf Backofen, Anita Marchfelder

In: Journal of Biological Chemistry, 2014, 289(10), 7164-77

The CRISPR-Cas (clustered regularly interspaced short palindromic repeats/ CRISPR associated) system is a prokaryotic defence mechanism against foreign genetic elements. A plethora of CRISPR-Cas versions exist, with more than 40 different Cas protein families and several different molecular approaches to fight the invading DNA. One of the key players in the system is the crRNA, which directs the invader - degrading Cas protein complex to the invader. The CRISPR-Cas types I and III use the Cas6 protein to generate mature crRNAs. Here, we show that the Cas6 protein is necessary for crRNA production but that additional Cas proteins that form a Cascade-like complex are needed for crRNA stability in the CRISPR-Cas type I-B system in Haloferax volcanii in vivo. Deletion of the cas6 gene results in loss of mature crRNAs and interference. However, cells that have the complete cas gene cluster (cas1-8) removed and are transformed with the cas6 gene are not able to produce and stably maintain mature crRNAs. crRNA production and stability is rescued only if cas5, 6 and 7 are present. Mutational analysis of the cas6 gene reveals three amino acids (His41, Gly256 and Gly258) that are essential for pre-crRNA cleavage, while the mutation of two amino acids (Ser115 and Ser224) leads to an increase of crRNA amounts. This is the first systematic in vivo analysis of Cas6 protein variants. In addition, we show that the H. volcanii I-B system contains a Cascade-like complex with a Cas7, Cas5 and Cas6 core that protects the crRNA.

Available:
pdf (6367 KB)   doi:10.1074/jbc.M113.508184   pmid:24459147   BibTeX Entry ( Brendel_Stoll_Lange-compl_Cas_prote-JBC2014 )

GraphProt: modeling binding preferences of RNA-binding proteins

Daniel Maticzka, Sita J. Lange, Fabrizio Costa, Rolf Backofen

In: Genome Biol, 2014, 15(1), R17

We present GraphProt, a computational framework for learning sequence- and structure-binding preferences of RNA-binding proteins (RBPs) from high-throughput experimental data. We benchmark GraphProt, demonstrating that the modeled binding preferences conform to the literature, and showcase the biological relevance and two applications of GraphProt models. First, estimated binding affinities correlate with experimental measurements. Second, predicted Ago2 targets display higher levels of expression upon Ago2 knockdown, whereas control targets do not. Computational binding models, such as those provided by GraphProt, are essential to predict RBP-binding sites and affinities in all tissues. GraphProt is freely available at http://www.bioinf.uni-freiburg.de/Software/GraphProt.

Available:
pdf (3088 KB)   doi:10.1186/gb-2014-15-1-r17   pmid:24451197   BibTeX Entry ( Maticzka_Lange_Costa-Graph_model_bindi-2014 )

LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search

Sebastian Will, Michael F. Siebauer, Steffen Heyne, Jan Engelhardt, Peter F. Stadler, Kristin Reiche, Rolf Backofen

In: Algorithms Mol Biol, 2013, 8(1), 14

BACKGROUND: The search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAsor microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptor-based methods. In most practical applications published to-date, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task? RESULTS: Sequence-structure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is pre-processed to obtain local base pairing probabilities. As a search engine we devised a semi-global scanning variant of LocARNA's algorithm for sequence-structure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence. CONCLUSIONS: Although we demonstrate that a purely structure-based homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptomewide applications, such methods will provide accurate structure annotations on the target side. AVAILABILITY: Source code of the free software LocARNAscan 1.0 and supplementary data are available at http://www.bioinf.uni-leipzig.de/Software/LocARNAscan.

Available:
pdf (766 KB)   doi:10.1186/1748-7188-8-14   pmid:23601347   BibTeX Entry ( Will_Siebauer_Heyne-LocAR_Incor_therm-2013 )

Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources

Dietrich Rebholz-Schuhmann, Senay Kafkas, Jee-Hyub Kim, Chen Li, Antonio Jimeno Yepes, Robert Hoehndorf, Rolf Backofen, Ian Lewin

In: J Biomed Semantics, 2013, 4(1), 28

MOTIVATION: The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e. corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs. RESULTS: In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measure performance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon based approaches (LexTag) in combination with disambiguation methods show better results on FsuPrge and PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions have different precision and recall profiles at the same F1-measure across all corpora. Higher recall is achieved with larger lexical resources, which also introduce more noise (false positive results). The ML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the training corpus. As expected, the false negative errors characterize the test corpora and - on the other hand - the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions that are based on a large terminological resource in combination with false positive filtering produce better results, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tag solutions. CONCLUSION: The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus should be trained using several different corpora to reduce possible biases. The LexTag solutions have different profiles for their precision and recall performance, but with similar F1-measure. This result is surprising and suggests that they cover a portion of the most common naming standards, but cope differently with the term variability across the corpora. The false positive filtering applied to LexTag solutions does improve the results by increasing their precision without compromising significantly their recall. The harmonisation of the annotation schemes in combination with standardized lexical resources in the tagging solutions will enable their comparability and will pave the way for a shared standard.

Available:
pdf (3306 KB)   doi:10.1186/2041-1480-4-28   pmid:24112383   BibTeX Entry ( Rebholz-Schuhmann_Kafkas_Kim-Evalu_gold_stand-2013 )

A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources

Dietrich Rebholz-Schuhmann, Christoph Grabmuller, Silvestras Kavaliauskas, Samuel Croset, Peter Woollard, Rolf Backofen, Wendy Filsell, Dominic Clark

In: Drug Discov Today, 2013, 19(7), 882–889

In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.

Available:
pdf (768 KB)   doi:10.1016/j.drudis.2013.10.024   pmid:24201223   BibTeX Entry ( Rebholz-Schuhmann_Grabmuller_Kavaliauskas-case_study_seman-2013 )

Evaluation and cross-comparison of lexical entities of biological interest (LexEBI)

Dietrich Rebholz-Schuhmann, Jee-Hyub Kim, Ying Yan, Abhishek Dixit, Caroline Friteyre, Robert Hoehndorf, Rolf Backofen, Ian Lewin

In: PLoS One, 2013, 8(10), e75185

MOTIVATION: Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical "term space" (the "Lexeome"), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). RESULT: This study compiles a resource for lexical terms of biomedical interest in a standard format (called "LexEBI"), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. CONCLUSION: LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.

Available:
pdf (1408 KB)   doi:10.1371/journal.pone.0075185   pmid:24124474   BibTeX Entry ( Rebholz-Schuhmann_Kim_Yan-Evalu_and_cross-2013 )

Distribution of Graph-Distances in Boltzmann Ensembles of RNA Secondary Structures

Rolf Backofen, Markus Fricke, Manja Marz, Jing Qin, Peter F. Stadler

In: Aaron Darling, Jens Stoye, Algorithms in Bioinformatics, Lecture Notes in Computer Science, 2013, 8126, 112-125

Large RNA molecules often carry multiple functional domains whose spatial arrangement is an important determinant of their function. Pre-mRNA splicing, furthermore, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar e ects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium therefore provides useful information on the overall shape of the molecule can provide insights into the interplay of its functional domains. Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between arbitrary nucleotides can be computed in polynomial time by means of dynamic programming. A naive implementation would yield recursions with a very high time complexity of O(n11). Although we were able to reduce this to O(n6) for many practical applications a further reduction seems dicult. We conclude, therefore, that sampling approaches, which are much easier to implement, are also theoretically favorable for most real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules.

Available:
pdf (2215 KB)   doi:10.1007/978-3-642-40453-5_10   BibTeX Entry ( backofen13:_distr_graph_distan_boltz_ensem )

Requirements for a successful defence reaction by the CRISPR-Cas subtype I-B system

Britta Stoll, Lisa-Katharina Maier, Sita J. Lange, Jutta Brendel, Susan Fischer, Rolf Backofen, Anita Marchfelder

In: Biochem Soc Trans, 2013, 41(6), 1444-8

Uptake of foreign mobile genetic elements is often detrimental and can result in cell death. For protection against invasion, prokaryotes have developed several defence mechanisms, which take effect at all stages of infection; an example is the recently discovered CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) immune system. This defence system directly degrades invading genetic material and is present in almost all archaea and many bacteria. Current data indicate a large variety of mechanistic molecular approaches. Although almost all archaea carry this defence weapon, only a few archaeal systems have been fully characterized. In the present paper, we summarize the prerequisites for the detection and degradation of invaders in the halophilic archaeon Haloferax volcanii. H. volcanii encodes a subtype I-B CRISPR-Cas system and the defence can be triggered by a plasmid-based invader. Six different target-interference motifs are recognized by the Haloferax defence and a 9-nt non-contiguous seed sequence is essential. The repeat sequence has the potential to fold into a minimal stem-loop structure, which is conserved in haloarchaea and might be recognized by the Cas6 endoribonuclease during the processing of CRISPR loci into mature crRNA (CRISPR RNA). Individual crRNA species were present in very different concentrations according to an RNA-Seq analysis and many were unable to trigger a successful defence reaction. Recognition of the plasmid invader does not depend on its copy number, but instead results indicate a dependency on the type of origin present on the plasmid.

Available:
pdf (309 KB)   doi:10.1042/BST20130098   pmid:24256235   BibTeX Entry ( Stoll_Maier_Lange-Requi_for_succe-2013 )

Comparative genomics boosts target prediction for bacterial small RNAs

Patrick R. Wright, Andreas S. Richter, Kai Papenfort, Martin Mann, Jorg Vogel, Wolfgang R. Hess, Rolf Backofen, Jens Georg

In: Proc. Natl. Acad. Sci. USA, 2013, 110(37), E3487-96

Small RNAs (sRNAs) constitute a large and heterogeneous class of bacterial gene expression regulators. Much like eukaryotic microRNAs, these sRNAs typically target multiple mRNAs through short seed pairing, thereby acting as global posttranscriptional regulators. In some bacteria, evidence for hundreds to possibly more than 1,000 different sRNAs has been obtained by transcriptome sequencing. However, the experimental identification of possible targets and, therefore, their confirmation as functional regulators of gene expression has remained laborious. Here, we present a strategy that integrates phylogenetic information to predict sRNA targets at the genomic scale and reconstructs regulatory networks upon functional enrichment and network analysis (CopraRNA, for Comparative Prediction Algorithm for sRNA Targets). Furthermore, CopraRNA precisely predicts the sRNA domains for target recognition and interaction. When applied to several model sRNAs, CopraRNA revealed additional targets and functions for the sRNAs CyaR, FnrS, RybB, RyhB, SgrS, and Spot42. Moreover, the mRNAs gdhA, lrp, marA, nagZ, ptsI, sdhA, and yobF-cspC were suggested as regulatory hubs targeted by up to seven different sRNAs. The verification of many previously undetected targets by CopraRNA, even for extensively investigated sRNAs, demonstrates its advantages and shows that CopraRNA-based analyses can compete with experimental target prediction approaches. A Web interface allows high-confidence target prediction and efficient classification of bacterial sRNAs.

Available:
pdf (4975 KB)   doi:10.1073/pnas.1303248110   pmid:23980183   BibTeX Entry ( Wright_Richter_Papenfort-Compa_genom_boost-PNAS2013 )

Tandem Stem-Loops in roX RNAs Act Together to Mediate X Chromosome Dosage Compensation in Drosophila

Ibrahim Avsar Ilik, Jeffrey J. Quinn, Plamen Georgiev, Filipe Tavares-Cadete, Daniel Maticzka, Sarah Toscano, Yue Wan, Robert C. Spitale, Nicholas Luscombe, Rolf Backofen, Howard Y. Chang, Asifa Akhtar

In: Mol Cell, 2013, 51(2), 156-73

Dosage compensation in Drosophila is an epigenetic phenomenon utilizing proteins and long noncoding RNAs (lncRNAs) for transcriptional upregulation of the male X chromosome. Here, by using UV crosslinking followed by deep sequencing, we show that two enzymes in the Male-Specific Lethal complex, MLE RNA helicase and MSL2 ubiquitin ligase, bind evolutionarily conserved domains containing tandem stem-loops in roX1 and roX2 RNAs in vivo. These domains constitute the minimal RNA unit present in multiple copies in diverse arrangements for nucleation of the MSL complex. MLE binds to these domains with distinct ATP-independent and ATP-dependent behavior. Importantly, we show that different roX RNA domains have overlapping function, since only combinatorial mutations in the tandem stem-loops result in severe loss of dosage compensation and consequently male-specific lethality. We propose that repetitive structural motifs in lncRNAs could provide plasticity during multiprotein complex assemblies to ensure efficient targeting in cis or in trans along chromosomes.

Available:
pdf (3307 KB)   doi:10.1016/j.molcel.2013.07.001   pmid:23870142   BibTeX Entry ( Ilik_Quinn_Georgiev-Tande_Stem_roX-2013 )

Salimabromide: Unexpected Chemistry from the Obligate Marine Myxobacterium Enhygromxya salina

Stephan Felder, Sandra Dreisigacker, Stefan Kehraus, Edith Neu, Gabriele Bierbaum, Patrick R. Wright, Dirk Menche, Till F. Schaberle, Gabriele M. Konig

In: Chemistry, 2013, 19(28), 9319-24

Marine myxobacteria (Enhygromyxa, Plesiocystis, Pseudoenhygromyxa, Haliangium) are phylogenetically distant from their terrestrial counterparts. Salimabromide is the first natural product from the Plesiocystis/Enhygromyxa clade of obligatory marine myxobacteria. Salimabromide has a new tetracyclic carbon skeleton, comprising a brominated benzene ring, a furano lactone residue, and a cyclohexane ring, bridged by a seven-membered cyclic moiety. The absolute configuration was deduced from experimental and calculated CD data. Salimabromide revealed antibiotic activity towards Arthrobacter cristallopoietes.

Available:
pdf (327 KB)   doi:10.1002/chem.201301379   pmid:23703738   BibTeX Entry ( Felder_Dreisigacker_Kehraus-Salim_Unexp_Chemi-2013 )

CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems

Sita J. Lange, Omer S. Alkhnbashi, Dominic Rose, Sebastian Will, Rolf Backofen

In: Nucleic Acids Res, 2013, 41(17), 8034-44

Central to Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas systems are repeated RNA sequences that serve as Cas-protein-binding templates. Classification is based on the architectural composition of associated Cas proteins, considering repeat evolution is essential to complete the picture. We compiled the largest data set of CRISPRs to date, performed comprehensive, independent clustering analyses and identified a novel set of 40 conserved sequence families and 33 potential structure motifs for Cas-endoribonucleases with some distinct conservation patterns. Evolutionary relationships are presented as a hierarchical map of sequence and structure similarities for both a quick and detailed insight into the diversity of CRISPR-Cas systems. In a comparison with Cas-subtypes, I-C, I-E, I-F and type II were strongly coupled and the remaining type I and type III subtypes were loosely coupled to repeat and Cas1 evolution, respectively. Subtypes with a strong link to CRISPR evolution were almost exclusive to bacteria; nevertheless, we identified rare examples of potential horizontal transfer of I-C and I-E systems into archaeal organisms. Our easy-to-use web server provides an automated assignment of newly sequenced CRISPRs to our classification system and enables more informed choices on future hypotheses in CRISPR-Cas research: http://rna.informatik.uni-freiburg.de/CRISPRmap.

Publication note:
SJL, OSA and DR contributed equally to this work.

Available:
pdf (1547 KB)   doi:10.1093/nar/gkt606   pmid:23863837   BibTeX Entry ( Lange_Alkhnbashi_Rose-CRISP_autom_class-NAR2013 )

A graph kernel approach for alignment-free domain-peptide interaction prediction with an application to human SH3 domains

Kousik Kundu, Fabrizio Costa, Rolf Backofen

In: Bioinformatics, 2013, 29(13), i335-i343

MOTIVATION: State-of-the-art experimental data for determining binding specificities of peptide recognition modules (PRMs) is obtained by high-throughput approaches like peptide arrays. Most prediction tools applicable to this kind of data are based on an initial multiple alignment of the peptide ligands. Building an initial alignment can be error-prone, especially in the case of the proline-rich peptides bound by the SH3 domains. RESULTS: Here, we present a machine-learning approach based on an efficient graph-kernel technique to predict the specificity of a large set of 70 human SH3 domains, which are an important class of PRMs. The graph-kernel strategy allows us to (i) integrate several types of physico-chemical information for each amino acid, (ii) consider high-order correlations between these features and (iii) eliminate the need for an initial peptide alignment. We build specialized models for each human SH3 domain and achieve competitive predictive performance of 0.73 area under precision-recall curve, compared with 0.27 area under precision-recall curve for state-of-the-art methods based on position weight matrices. We show that better models can be obtained when we use information on the noninteracting peptides (negative examples), which is currently not used by the state-of-the art approaches based on position weight matrices. To this end, we analyze two strategies to identify subsets of high confidence negative data. The techniques introduced here are more general and hence can also be used for any other protein domains, which interact with short peptides (i.e. other PRMs). AVAILABILITY: The program with the predictive models can be found at http://www.bioinf.uni-freiburg.de/Software/SH3PepInt/SH3PepInt.tar.gz. We also provide a genome-wide prediction for all 70 human SH3 domains, which can be found under http://www.bioinf.uni-freiburg.de/Software/SH3PepInt/Genome-Wide-Predictions% .tar. gz. CONTACT: backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
pdf (503 KB)   doi:10.1093/bioinformatics/btt220   pmid:23813002   BibTeX Entry ( Kundu_Costa_Backofen-graph_kerne_appro-2013 )

Atom Mapping with Constraint Programming

Martin Mann, Feras Nahar, Heinz Ekker, Rolf Backofen, Peter F. Stadler, Christoph Flamm

In: C. Schulte, Proc. of the 19th International Conference on Principles and Practice of Constraint Programming (CP'13), LNCS, 2013, 8124, 805-822

Chemical reactions consist of a rearrangement of bonds so that each atom in an educt molecule appears again in a specific position of a reaction product. In general this bijection between educt and product atoms is not reported by chemical reaction databases, leaving the Atom Mapping Problem as an important computational task for many practical applications in computational chemistry and systems biology. Elementary chemical reactions feature a cyclic imaginary transition state (ITS) that imposes additional restrictions on the bijection between educt and product atoms that are not taken into account by previous approaches. We demonstrate that Constraint Programming is well-suited to solving the Atom Mapping Problem in this setting. The performance of our approach is evaluated for a subset of chemical reactions from the KEGG database featuring various ITS cycle layouts and reaction mechanisms.

Available:
pdf (556 KB)   doi:10.1007/978-3-642-40627-0_59   BibTeX Entry ( Mann_atommapping_13 )

Semi-Supervised Prediction of SH2-Peptide Interactions from Imbalanced High-Throughput Data

Kousik Kundu, Fabrizio Costa, Michael Huber, Michael Reth, Rolf Backofen

In: PLoS One, 2013, 8(5), e62732

Src homology 2 (SH2) domains are the largest family of the peptide-recognition modules (PRMs) that bind to phosphotyrosine containing peptides. Knowledge about binding partners of SH2-domains is key for a deeper understanding of different cellular processes. Given the high binding specificity of SH2, in-silico ligand peptide prediction is of great interest. Currently however, only a few approaches have been published for the prediction of SH2-peptide interactions. Their main shortcomings range from limited coverage, to restrictive modeling assumptions (they are mainly based on position specific scoring matrices and do not take into consideration complex amino acids inter-dependencies) and high computational complexity. We propose a simple yet effective machine learning approach for a large set of known human SH2 domains. We used comprehensive data from micro-array and peptide-array experiments on 51 human SH2 domains. In order to deal with the high data imbalance problem and the high signal-to-noise ration, we casted the problem in a semi-supervised setting. We report competitive predictive performance w.r.t. state-of-the-art. Specifically we obtain 0.83 AUC ROC and 0.93 AUC PR in comparison to 0.71 AUC ROC and 0.87 AUC PR previously achieved by the position specific scoring matrices (PSSMs) based SMALI approach. Our work provides three main contributions. First, we showed that better models can be obtained when the information on the non-interacting peptides (negative examples) is also used. Second, we improve performance when considering high order correlations between the ligand positions employing regularization techniques to effectively avoid overfitting issues. Third, we developed an approach to tackle the data imbalance problem using a semi-supervised strategy. Finally, we performed a genome-wide prediction of human SH2-peptide binding, uncovering several findings of biological relevance. We make our models and genome-wide predictions, for all the 51 SH2-domains, freely available to the scientific community under the following URLs: http://www.bioinf.uni-freiburg.de/Software/SH2PepInt/SH2PepInt.tar.gz and http://www.bioinf.uni-freiburg.de/Software/SH2PepInt/Genome-wide-predictions% .tar. gz, respectively.

Available:
pdf (859 KB)   doi:10.1371/journal.pone.0062732   pmid:23690949   BibTeX Entry ( Kundu_Costa_Huber-Semi_Predi_Inter-2013 )

Two CRISPR-Cas systems in Methanosarcina mazei strain Go1 display common processing features despite belonging to different types I and III

Lisa Nickel, Katrin Weidenbach, Dominik Jager, Rolf Backofen, Sita J. Lange, Nadja Heidrich, Ruth A. Schmitz

In: RNA Biol, 2013, 10(5), 779-791

The clustered regularly interspaced short palindromic repeats (CRISPR) system represents a highly adaptive and heritable defense system against foreign nucleic acids in bacteria and archaea. We analyzed the two CRISPR-Cas systems in Methanosarcina mazei strain Go1. Although belonging to different subtypes (I-B and III-B), the leaders and repeats of both loci are nearly identical. Also, despite many point mutations in each array, a common hairpin motif was identified in the repeats by a bioinformatics analysis and in vitro structural probing. The expression and maturation of CRISPR-derived RNAs (crRNAs) were studied in vitro and in vivo. Both respective potential Cas6b-type endonucleases were purified and their activity tested in vitro. Each protein showed significant activity and could cleave both repeats at the same processing site. Cas6b of subtype III-B, however, was significantly more efficient in its cleavage activity compared with Cas6b of subtype I-B. Northern blot and differential RNaseq analyses were performed to investigate in vivo transcription and maturation of crRNAs, revealing generally very low expression of both systems, whereas significant induction at high NaCl concentrations was observed. crRNAs derived proximal to the leader were generally more abundant than distal ones and in vivo processing sites were clarified for both loci, confirming the previously well-established 8 nt 5' repeat tags. The 3'-ends were more diverse, but generally ended in a prefix of the following repeat sequence (3'-tag). The analysis further revealed a 5'-hydroxy and 3'-phosphate termini architecture of small crRNAs specific for cleavage products of Cas6 endonucleases from type I-E and I-F and type III-B.

Available:
pdf (2000 KB)   doi:10.4161/rna.23928   pmid:23619576   BibTeX Entry ( Nickel_Weidenbach_Jager-Two_CRISP_syste-2013 )

Essential requirements for the detection and degradation of invaders by the Haloferax volcanii CRISPR/Cas system I-B

Lisa-Katharina Maier, Sita J. Lange, Britta Stoll, Karina A. Haas, Susan Fischer, Eike Fischer, Elke Duchardt-Ferner, Jens Wohnert, Rolf Backofen, Anita Marchfelder

In: RNA Biology, 2013, 10(5), 865-874

To fend off foreign genetic elements, prokaryotes have developed several defense systems. The most recently discovered defense system, CRISPR/Cas, is sequence-specific, adaptive and heritable. The two central components of this system are the Cas proteins and the CRISPR RNA. The latter consists of repeat sequences that are interspersed with spacer sequences. The CRISPR locus is transcribed into a precursor RNA that is subsequently processed into short crRNAs. CRISPR/Cas systems have been identified in bacteria and archaea, and data show that many variations of this system exist. We analyzed the requirements for a successful defense reaction in the halophilic archaeon Haloferax volcanii. Haloferax encodes a CRISPR/Cas system of the I-B subtype, about which very little is known. Analysis of the mature crRNAs revealed that they contain a spacer as their central element, which is preceded by an eight-nucleotide-long 5' handle that originates from the upstream repeat. The repeat sequences have the potential to fold into a minimal stem loop. Sequencing of the crRNA population indicated that not all of the spacers that are encoded by the three CRISPR loci are present in the same abundance. By challenging Haloferax with an invader plasmid, we demonstrated that the interaction of the crRNA with the invader DNA requires a 10-nucleotide-long seed sequence. In addition, we found that not all of the crRNAs from the three CRISPR loci are effective at triggering the degradation of invader plasmids. The interference does not seem to be influenced by the copy number of the invader plasmid.

Available:
pdf (1187 KB)   doi:10.4161/rna.24282   pmid:23594992   BibTeX Entry ( Maier_Lange_Stoll-Essen_requi_for-2013 )

The Graph Grammar Library - a generic framework for chemical graph rewrite systems

Martin Mann, Heinz Ekker, Christoph Flamm

In: Keith Duddy, Gerti Kappel, Theory and Practice of Model Transformations, Proc. of ICMT 2013, LNCS, 2013, 7909, 52-53

Graph rewrite systems are powerful tools to model and study complex problems in various fields of research. Their successful application to chemical reaction modelling on a molecular level was shown but no appropriate and simple system is available at the moment. The presented Graph Grammar Library (GGL) implements a generic Double Push Out approach for general graph rewrite systems. The framework focuses on a high level of modularity as well as high performance, using state-of-the-art algorithms and data structures, and comes with extensive documentation. The large GGL chemistry module enables extensive and detailed studies of chemical systems. It well meets the requirements and abilities envisioned by Yadav et al. (2004) for such chemical rewrite systems. Here, molecules are represented as undirected labeled graphs while chemical reactions are described by according graph grammar rules. Beside the graph transformation, the GGL offers advanced cheminformatics algorithms for instance to estimate energies ofmolecules or aromaticity perception. These features are illustrated using a set of reactions from polyketide chemistry a huge class of natural compounds of medical relevance. The graph grammar based simulation of chemical reactions offered by the GGL is a powerful tool for extensive cheminformatics studies on a molecular level. The GGL already provides rewrite rules for all enzymes listed in the KEGG LIGAND database is freely available at http://www.tbi.univie.ac.at/software/GGL/.

Publication note:
Extended abstract and poster at ICMT, full article at arXiv.

Available:
pdf (1165 KB)   ICMT-poster.pdf (361 KB)   ICMT-paper.pdf (143 KB)   doi:10.1007/978-3-642-38883-5_5   arXiv:1304.1356   BibTeX Entry ( Mann_GGL_13 )

CRISPR-Cas Systems in the Cyanobacterium Synechocystis sp. PCC6803 Exhibit Distinct Processing Pathways Involving at Least Two Cas6 and a Cmr2 Protein

Ingeborg Scholz, Sita J. Lange, Stephanie Hein, Wolfgang R. Hess, Rolf Backofen

In: PLoS One, 2013, 8(2), e56470

The CRISPR-Cas (Clustered Regularly Interspaced Short Palindrome Repeats - CRISPR associated proteins) system provides adaptive immunity in archaea and bacteria. A hallmark of CRISPR-Cas is the involvement of short crRNAs that guide associated proteins in the destruction of invading DNA or RNA. We present three fundamentally distinct processing pathways in the cyanobacterium Synechocystis sp. PCC6803 for a subtype I-D (CRISPR1), and two type III systems (CRISPR2 and CRISPR3), which are located together on the plasmid pSYSA. Using high-throughput transcriptome analyses and assays of transcript accumulation we found all CRISPR loci to be highly expressed, but the individual crRNAs had profoundly varying abundances despite single transcription start sites for each array. In a computational analysis, CRISPR3 spacers with stable secondary structures displayed a greater ratio of degradation products. These structures might interfere with the loading of the crRNAs into RNP complexes, explaining the varying abundancies. The maturation of CRISPR1 and CRISPR2 transcripts depends on at least two different Cas6 proteins. Mutation of gene sll7090, encoding a Cmr2 protein led to the disappearance of all CRISPR3-derived crRNAs, providing in vivo evidence for a function of Cmr2 in the maturation, regulation of expression, Cmr complex formation or stabilization of CRISPR3 transcripts. Finally, we optimized CRISPR repeat structure prediction and the results indicate that the spacer context can influence individual repeat structures.

Publication note:
IS and SJL contributed equally to this work.

Available:
pdf (2295 KB)   doi:10.1371/journal.pone.0056470   pmid:23441196   BibTeX Entry ( Scholz_Lange_Hein-CRISP_Syste_the-2013 )

Comparative analysis of Cas6b processing and CRISPR RNA stability

Hagen Richter, Sita J. Lange, Rolf Backofen, Lennart Randau

In: RNA Biology, 2013, 10(5), 700-707

The prokaryotic antiviral defense systems CRISPR (clustered regularly interspaced short palindromic repeats)/Cas (CRISPR-associated) employs short crRNAs (CRISPR RNAs) to target invading viral nucleic acids. A short spacer sequence of these crRNAs can be derived from a viral genome and recognizes a reoccurring attack of a virus via base complementarity. We analyzed the effect of spacer sequences on the maturation of crRNAs of the subtype I-B Methanococcus maripaludis C5 CRISPR cluster. The responsible endonuclease, termed Cas6b, bound non-hydrolyzable repeat RNA as a dimer and mature crRNA as a monomer. Comparative analysis of Cas6b processing of individual spacer-repeat-spacer RNA substrates and crRNA stability revealed the potential influence of spacer sequence and length on these parameters. Correlation of these observations with the variable abundance of crRNAs visualized by deep-sequencing analyses is discussed. Finally, insertion of spacer and repeat sequences with archaeal poly-T termination signals is suggested to be prevented in archaeal CRISPR/Cas systems.

Available:
pdf (2322 KB)   doi:10.4161/rna.23715   pmid:23392318   BibTeX Entry ( Richter_Lange_Backofen-CRISP_Compa_analy-2013 )

Regulatory RNAs in archaea: first target identification in Methanoarchaea

Daniela Prasse, Claudia Ehlers, Rolf Backofen, Ruth A. Schmitz

In: Biochem Soc Trans, 2013, 41(1), 344-9

sRNAs (small non-coding RNAs) representing important players in many cellular and regulatory processes have been identified in all three domains of life. In Eukarya and Bacteria, functions have been assigned for many sRNAs, whereas the sRNA populations in Archaea are considerably less well characterized. Recent analyses on a genome-wide scale particularly using high-throughput sequencing techniques demonstrated the presence of high numbers of sRNA candidates in several archaea. However, elucidation of the molecular mechanism of sRNA action, as well as understanding their physiological roles, is in general still challenging, particularly in Archaea, since efficient genetic tools are missing. The identification of cellular targets of identified archaeal sRNAs by experimental approaches or computational prediction programs has begun only recently. At present, targets have been identified for one archaeal sRNA, sRNA162 in Methanosarcina mazei, which interacts with the 5' region of its targets, a cis-encoded and a trans-encoded target, blurring the paradigm of a border between cis- and trans-encoded sRNAs. Besides, the first experimental implications have been obtained in Haloarchaea and Pyrobaculum that archaeal sRNAs also target 3' regions of mRNAs. The present review summarizes our current knowledge on archaeal sRNAs and their biological functions and targets.

Available:
pdf (373 KB)   doi:10.1042/BST20120280   pmid:23356309   BibTeX Entry ( Prasse_Ehlers_Backofen-Regul_RNAs_archa-2013 )

SPARSE: Quadratic Time Simultaneous Alignment and Folding of RNAs Without Sequence-Based Heuristics

Sebastian Will, Christina Schmiedl, Milad Miladi, Mathias Möhl, Rolf Backofen

In: Minghua Deng, Rui Jiang, Fengzhu Sun, Xuegong Zhang, Proceedings of the 17th International Conference on Research in Computational Molecular Biology (RECOMB 2013), LNCS, 2013, 7821, 289-290

Motivation: There is increasing evidence of pervasive transcription, resulting in hundreds of thousands of ncRNAs of unknown function. Standard computational analysis tasks for inferring functional annotations like clustering require fast and accurate RNA comparisons based on sequence and structure similarity. The gold standard for the latter is Sankoff’s algorithm, which simultaneously aligns and folds RNAs. Because of its extreme time complexity of O(n6), numerous faster "Sankoff-styleäpproaches have been suggested. Several such approaches introduce heuristics based on sequence alignment, which compromises the alignment quality for RNAs with sequence identities below 60\%. Avoiding such heuristics, as e.g. in LocARNA, has been assumed to prohibit time complexities better than O(n4), which strongly limits large-scale applications. Results: Breaking this barrier, we introduce SPARSE (Sparse Prediction and Alignment of RNAs using Structure Ensembles), a novel quadratic time Sankoff-style approach that does not rely on sequence-based heuristics but employs structural properties of RNA ensembles; its O(n2) complexity matches the one of sequence alignment. The approach is based on a novel lightweight Sankoff-style alignment model, for which we introduce the algorithm PARSE. For the first time it transfers the Sankoff-model completely to a lightweight energy model; thus, it is more expressive than all previous lightweight methods, which inherit the PMcomp model. In comparison to LocARNA and similar approaches, the novel model enables much stronger sparsification based on the RNA structure ensemble; consequently, SPARSE aligns and folds RNAs with similar alignment and better folding quality in significantly less time. Finally, SPARSE aligns ncRNAs from the challenging low sequence identity region more accurately than tools relying on sequence-based heuristics. Conclusion: Our results indicate that a complete lightweight Sankoff-style model with stronger sparsification can increase the performance and accuracy of RNA alignment, where the potential of the model points far beyond the studied prototype. Not falling back on sequence comparison, SPARSE suggests itself for large scale similarity assessment of RNAs with moderate to very low sequence identity.

Available:
pdf (135 KB)   doi:10.1007/978-3-642-37195-0_28   BibTeX Entry ( Will_etal-SPARSE-RECOMB2013 )

Impact of the Energy Model on the Complexity of RNA Folding with Pseudoknots

Saad Sheikh, Rolf Backofen, Yann Ponty

In: Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Helsinki, Finland, July 3-5, 2012. Proceedings, 2012, 321--333

Predicting the folding of an RNA sequence, while allow- ing general pseudoknots (PK), consists in finding a minimal free-energy matching of its n positions. Assuming independently contributing base- pairs, the problem can be solved in Θ(n3)-time using a variant of the maximal weighted matching. By contrast, the problem was previously proven NP-Hard in the more realistic nearest-neighbor energy model. In this work, we consider an intermediate model, called the stacking- pairs energy model. We extend a result by Lyngsø, showing that RNA folding with PK is NP-Hard within a large class of parametrization for the model. We also show the approximability of the problem, by giving a practical Θ(n3) algorithm that achieves at least a 5-approximation for any parametrization of the stacking model. This contrasts nicely with the nearest-neighbor version of the problem, which we prove cannot be approximated within any positive ratio, unless P = NP.

Available:
pdf (689 KB)   doi:10.1007/978-3-642-31265-6_26   BibTeX Entry ( sheikh12:_impac_energ_model_compl_rna_foldin_pseud )

Antioxidant activity of Hawaiian marine algae

Dovi Kelman, Ellen Kromkowski Posner, Karla J. McDermid, Nicole K. Tabandera, Patrick R. Wright, Anthony D. Wright

In: Mar Drugs, 2012, 10(2), 403-16

Marine algae are known to contain a wide variety of bioactive compounds, many of which have commercial applications in pharmaceutical, medical, cosmetic, nutraceutical, food and agricultural industries. Natural antioxidants, found in many algae, are important bioactive compounds that play an important role against various diseases and ageing processes through protection of cells from oxidative damage. In this respect, relatively little is known about the bioactivity of Hawaiian algae that could be a potential natural source of such antioxidants. The total antioxidant activity of organic extracts of 37 algal samples, comprising of 30 species of Hawaiian algae from 27 different genera was determined. The activity was determined by employing the FRAP (Ferric Reducing Antioxidant Power) assays. Of the algae tested, the extract of Turbinaria ornata was found to be the most active. Bioassay-guided fractionation of this extract led to the isolation of a variety of different carotenoids as the active principles. The major bioactive antioxidant compound was identified as the carotenoid fucoxanthin. These results show, for the first time, that numerous Hawaiian algae exhibit significant antioxidant activity, a property that could lead to their application in one of many useful healthcare or related products as well as in chemoprevention of a variety of diseases including cancer.

Available:
pdf (683 KB)   doi:10.3390/md10020403   pmid:22412808   BibTeX Entry ( Kelman_Posner_McDermid-Antio_activ_Hawai-2012 )

Abstract folding space analysis based on helices

Jiabin Huang, Rolf Backofen, Bjorn Voss

In: RNA, 2012, 18(12), 2135-47

RNA has many pivotal functions especially in the regulation of gene expression by ncRNAs. Identification of their structure is an important requirement for understanding their function. Structure prediction alone is often insufficient for this task, due to algorithmic problems, parameter inaccuracies, and biological peculiarities. Among the latter, there are base modifications, cotranscriptional folding leading to folding traps, and conformational switching as in the case of riboswitches. All these require more in-depth analysis of the folding space. The major drawback, which all methods have to cope with, is the exponential growth of the folding space. Therefore, methods are often limited in the sequence length they can analyze, or they make use of heuristics, sampling, or abstraction. Our approach adopts the abstraction strategy and remedies some problems of existing methods. We introduce a position-specific abstraction based on helices that we term helix index shapes, or hishapes for short. Utilizing a dynamic programming framework, we have implemented this abstraction in the program RNAHeliCes. Furthermore, we developed two hishape-based methods, one for energy barrier estimation, called HiPath, and one for abstract structure comparison, termed HiTed. We demonstrate the superior performance of HiPath compared to other existing methods and the competitive accuracy of HiTed. RNAHeliCes, together with HiPath and HiTed, are available for download at http://www.cyanolab.de/software/RNAHeliCes.htm.

Available:
pdf (1058 KB)   doi:10.1261/rna.033548.112   pmid:23104999   BibTeX Entry ( Huang:Backofen:Voss:Abstr_foldi_space:2012 )

Navigating the unexplored seascape of pre-miRNA candidates in single-genome approaches

Nuno D. Mendes, Steffen Heyne, Ana T. Freitas, Marie-France Sagot, Rolf Backofen

In: Bioinformatics, 2012, 28(23), 3034-41

MOTIVATION: The computational search for novel microRNA (miRNA) precursors often involves some sort of structural analysis with the aim of identifying which type of structures are prone to being recognized and processed by the cellular miRNA-maturation machinery. A natural way to tackle this problem is to perform clustering over the candidate structures along with known miRNA precursor structures. Mixed clusters allow then the identification of candidates that are similar to known precursors. Given the large number of pre-miRNA candidates that can be identified in single-genome approaches, even after applying several filters for precursor robustness and stability, a conventional structural clustering approach is unfeasible. RESULTS: We propose a method to represent candidate structures in a feature space, which summarizes key sequence/structure characteristics of each candidate. We demonstrate that proximity in this feature space is related to sequence/structure similarity, and we select candidates that have a high similarity to known precursors. Additional filtering steps are then applied to further reduce the number of candidates to those with greater transcriptional potential. Our method is compared with another single-genome method (TripletSVM) in two datasets, showing better performance in one and comparable performance in the other, for larger training sets. Additionally, we show that our approach allows for a better interpretation of the results. AVAILABILITY AND IMPLEMENTATION: The MinDist method is implemented using Perl scripts and is freely available at http://www.cravela.org/?mindist=1. CONTACT: backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
supplement.pdf (986 KB)   pdf (2048 KB)   doi:10.1093/bioinformatics/bts574   pmid:23052038   BibTeX Entry ( Mendes_Heyne_Freitas-Navig_the_unexp-2012 )

An archaeal sRNA targeting cis- and trans- encoded mRNAs via two distinct domains

Dominik Jäger, Sandy R. Pernitzsch, Andreas S. Richter, Rolf Backofen, Cynthia M. Sharma, Ruth A. Schmitz

In: Nucleic Acids Res, 2012, 40(21), 10964-79

We report on the characterization and target analysis of the small (s)RNA(162) in the methanoarchaeon Methanosarcina mazei. Using a combination of genetic approaches, transcriptome analysis and computational predictions, the bicistronic MM2441-MM2440 mRNA encoding the transcription factor MM2441 and a protein of unknown function was identified as a potential target of this sRNA, which due to processing accumulates as three stabile 5' fragments in late exponential growth. Mobility shift assays using various mutants verified that the non-structured single-stranded linker region of sRNA(162) (SLR) base-pairs with the MM2440-MM2441 mRNA internally, thereby masking the predicted ribosome binding site of MM2441. This most likely leads to translational repression of the second cistron resulting in dis-coordinated operon expression. Analysis of mutant RNAs in vivo confirmed that the SLR of sRNA(162) is crucial for target interactions. Furthermore, our results indicate that sRNA(162)-controlled MM2441 is involved in regulating the metabolic switch between the carbon sources methanol and methylamine. Moreover, biochemical studies demonstrated that the 5' end of sRNA(162) targets the 5'-untranslated region of the cis-encoded MM2442 mRNA. Overall, this first study of archaeal sRNA/mRNA-target interactions unraveled that sRNA(162) acts as an antisense (as)RNA on cis- and trans-encoded mRNAs via two distinct domains, indicating that cis-encoded asRNAs can have larger target regulons than previously anticipated.

Available:
pdf (3805 KB)   supplement.pdf (2590 KB)   doi:10.1093/nar/gks847   pmid:22965121   BibTeX Entry ( Jager_Pernitzsch_Richter-archa_sRNA_targe-NAR2012 )

Characterization of CRISPR RNA processing in Clostridium thermocellum and Methanococcus maripaludis

Hagen Richter, Judith Zoephel, Jeanette Schermuly, Daniel Maticzka, Rolf Backofen, Lennart Randau

In: Nucleic Acids Res, 2012, 40(19), 9887-96

The CRISPR arrays found in many bacteria and most archaea are transcribed into a long precursor RNA that is processed into small clustered regularly interspaced short palindromic repeats (CRISPR) RNAs (crRNAs). These RNA molecules can contain fragments of viral genomes and mediate, together with a set of CRISPR-associated (Cas) proteins, the prokaryotic immunity against viral attacks. CRISPR/Cas systems are diverse and the Cas6 enzymes that process crRNAs vary between different subtypes. We analysed CRISPR/Cas subtype I-B and present the identification of novel Cas6 enzymes from the bacterial and archaeal model organisms Clostridium thermocellum and Methanococcus maripaludis C5. Methanococcus maripaludis Cas6b in vitro activity and specificity was determined. Two complementary catalytic histidine residues were identified. RNA-Seq analyses revealed in vivo crRNA processing sites, crRNA abundance and orientation of CRISPR transcription within these two organisms. Individual spacer sequences were identified with strong effects on transcription and processing patterns of a CRISPR cluster. These effects will need to be considered for the application of CRISPR clusters that are designed to produce synthetic crRNAs.

Available:
pdf (4447 KB)   doi:10.1093/nar/gks737   pmid:22879377   BibTeX Entry ( Richter_Zoephel_Schermuly-Chara_CRISP_RNA-NAR2012 )

Atom Mapping with Constraint Programming

Martin Mann, Heinz Ekker, Peter F. Stadler, Christoph Flamm

In: Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB 2012), 2012, 23-29

The mass ow in a chemical reaction network is determined by the propagation of atoms from educt to product molecules within each of the constituent chemical reactions. The Atom Mapping Problem for a given chemical reaction is the computational task of determining the correspondences of the atoms between educt and product molecules. We propose here a Constraint Programming approach to identify atom mappings for ëlementary" reactions. These feature a cyclic imaginary transition state (ITS) imposing an additional strong constraint on the bijection between educt and product atoms. The ongoing work presented here identifies only chemically feasible ITSs by integrating the cyclic structure of the chemical transformation into the search.

Publication note:
http://www.bioinf.uni-freiburg.de/Events/WCB12/proceedings.pdf

Available:
pdf (407 KB)   BibTeX Entry ( Mann_atomMapping_12 )

Producing high-accuracy lattice models from protein atomic co-ordinates including side chains

Martin Mann, Rhodri Saunders, Cameron Smith, Rolf Backofen, Charlotte M. Deane

In: Advances in Bioinformatics, 2012, 2012(Article ID 148045), 6

Lattice models are a common abstraction used in the study of protein structure, folding, and refinement. They are advantageous because the discretisation of space can make extensive protein evaluations computationally feasible. Various approaches to the protein chain lattice fitting problem have been suggested but only a single backbone-only tool is available currently. We introduce LatFit, a new tool to produce high-accuracy lattice protein models. It generates both backbone-only and backbone-side-chain models in any user defined lattice. LatFit implements a new distance RMSD-optimisation fitting procedure in addition to the known coordinate RMSD method. The program is freely available for academic download and as a web-server: http://cpsp.informatik.uni-freiburg.de/LatFit/ We tested LatFit's accuracy and speed using a large non-redundant set of high resolution proteins (SCOP database) on three commonly used lattices: 3D cubic, face-centred cubic, and knight's walk. Fitting speed compared favourably to other methods, and both backbone-only and backbone-side-chain models show low deviation from the original data about 1.5A RMSD in the FCC lattice). To our knowledge this represents the first comprehensive study of lattice quality for on-lattice protein models including side chains while LatFit is the only available tool for such models.

Publication note:
MM and RS contributed equally to this work.

Available:
data-used.txt (0 KB)   pdf (2136 KB)   doi:10.1155/2012/148045   BibTeX Entry ( Mann-Saunders:12 )

CARNA - alignment of RNA structure ensembles

Dragos A. Sorescu, Mathias Möhl, Martin Mann, Rolf Backofen, Sebastian Will

In: Nucleic Acids Res, 2012, 40(W1), W49-W53

Due to recent algorithmic progress, tools for the gold standard of comparative RNA analysis, namely Sankoff-style simultaneous alignment and folding, are now readily applicable. Such approaches, however, compare RNAs with respect to a simultaneously predicted, single, nested consensus structure. To make multiple alignment of RNAs available in cases, where this limitation of the standard approach is critical, we introduce a web server that provides a complete and convenient interface to the RNA structure alignment tool 'CARNA'. This tool uniquely supports RNAs with multiple conserved structures per RNA and aligns pseudoknots intrinsically; these features are highly desirable for aligning riboswitches, RNAs with conserved folding pathways, or pseudoknots. We represent structural input and output information as base pair probability dot plots; this provides large flexibility in the input, ranging from fixed structures to structure ensembles, and enables immediate visual analysis of the results. In contrast to conventional Sankoff-style approaches, 'CARNA' optimizes all structural similarities in the input simultaneously, for example across an entire RNA structure ensemble. Even compared with already costly Sankoff-style alignment, 'CARNA' solves an intrinsically much harder problem by applying advanced, constraint-based, algorithmic techniques. Although 'CARNA' is specialized to the alignment of RNAs with several conserved structures, its performance on RNAs in general is on par with state-of-the-art general-purpose RNA alignment tools, as we show in a Bralibase 2.1 benchmark. The web server is freely available at http://rna.informatik.uni-freiburg.de/CARNA.

Publication note:
DAS, MMö, and MMa contributed equally to this work.

Available:
pdf (387 KB)   doi:10.1093/nar/gks491   pmid:22689637   BibTeX Entry ( Sorescu_Mohl_Mann-CARNA_RNA_struc-NAR2012 )

GraphClust: alignment-free structural clustering of local RNA secondary structures

Steffen Heyne, Fabrizio Costa, Dominic Rose, Rolf Backofen

In: Bioinformatics, 2012, 28(12), i224-i232

MOTIVATION: Clustering according to sequence-structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs. RESULTS: We present a novel linear-time, alignment-free method for comparing and clustering RNAs according to sequence and structure. The approach scales to datasets of hundreds of thousands of sequences. The quality of the retrieved clusters has been benchmarked against known ncRNA datasets and is comparable to state-of-the-art sequence-structure methods although achieving speedups of several orders of magnitude. A selection of applications aiming at the detection of novel structural ncRNAs are presented. Exemplarily, we predicted local structural elements specific to lincRNAs likely functionally associating involved transcripts to vital processes of the human nervous system. In total, we predicted 349 local structural RNA elements. AVAILABILITY: The GraphClust pipeline is available on request. CONTACT: backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
pdf (483 KB)   supplement.pdf (1000 KB)   doi:10.1093/bioinformatics/bts224   pmid:22689765   BibTeX Entry ( Heyne:Costa:Rose:Graph_align_struc:2012 )

Accessibility and conservation: General features of bacterial small RNA-mRNA interactions?

Andreas S. Richter, Rolf Backofen

In: RNA Biol, 2012, 9(7), 954-65

Bacterial small RNAs (sRNAs) are a class of structural RNAs that often regulate mRNA targets via post-transcriptional base pair interactions. We determined features that discriminate functional from non-functional interactions and assessed the influence of these features on genome-wide target predictions. For this purpose, we compiled a set of 71 experimentally verified sRNA-target pairs from Escherichia coli and Salmonella enterica. Furthermore, we collected full-length 5' untranslated regions by using genome-wide experimentally verified transcription start sites. Only interaction sites in sRNAs, but not in targets, show significant sequence conservation. In addition to this observation, we found that the base pairing between sRNAs and their targets is not conserved in general across more distantly related species. A closer inspection of RybB and RyhB sRNAs and their targets revealed that the base pairing complementarity is only conserved in a small subset of the targets. In contrast to conservation, accessibility of functional interaction sites is significantly higher in both sRNAs and targets in comparison to non-functional sites. Based on the above observations, we successfully used the following constraints to improve the specificity of genome-wide target predictions: the region of interaction initiation must be located in (1) highly accessible regions in both interaction partners and (2) unstructured conserved sRNA regions derived from reliability profiles of multiple sRNA alignments. Aligned sequences of homologous sRNAs, functional and non-functional targets, and a supplementary document with supplementary tables, figures and references are available at http://www.bioinf.uni-freiburg.de/Supplements/srna-interact-feat.

Available:
supplement.pdf (207 KB)   pdf (2350 KB)   doi:10.4161/rna.20294   pmid:22767260   BibTeX Entry ( Richter_Backofen-Acces_and_conse-2012 )

MicroRNAs in Cancer Translational Research: The Microcosm of Cancer Diagnosis, Prognosis and Therapy

Dominic Rose

In: Frontiers in Genetics, 2012, 3(42)

No Abstract available

Available:
pdf (226 KB)   doi:10.3389/fgene.2012.00042   BibTeX Entry ( ROSE_MICRORNAS_IN_CANCER_TRANSLATIONAL_RESEARCH_BOOK_REVIEW_FRONTIERS2012 )

Global or local? Predicting secondary structure and accessibility in mRNAs

Sita J. Lange, Daniel Maticzka, Mathias Möhl, Joshua N. Gagnon, Chris M. Brown, Rolf Backofen

In: Nucleic Acids Res, 2012, 40(12), 5215-26

Determining the structural properties of mRNA is key to understanding vital post-transcriptional processes. As experimental data on mRNA structure are scarce, accurate structure prediction is required to characterize RNA regulatory mechanisms. Although various structure prediction approaches are available, it is often unclear which to choose and how to set their parameters. Furthermore, no standard measure to compare predictions of local structure exists. We assessed the performance of different methods using two types of data: transcriptome-wide enzymatic probing information and a large, curated set of cis-regulatory elements. To compare the approaches, we introduced structure accuracy, a measure that is applicable to both global and local methods. Our results showed that local folding was more accurate than the classic global approach. We investigated how the locality parameters, maximum base pair span and window size, influenced the prediction performance. A span of 150 provided a reasonable balance between maximizing the number of accurately predicted base pairs, while minimizing effects of incorrect long-range predictions. We characterized the error at artificial sequence ends, which we reduced by setting the window size sufficiently greater than the maximum span. Our method, LocalFold, diminished all border effects and produced the most robust performance.

Publication note:
SJL and DM contributed equally to this work.

Available:
pdf (5230 KB)   doi:10.1093/nar/gks181   pmid:22373926   BibTeX Entry ( Lange:Maticzka:Mohl:Globa_local_Predi:NAR2012 )

Microstructure Alignment of Wood Density Profiles: an Approach to Equalize Radial Differences in Growth Rate

Bela Bender, Martin Mann, Rolf Backofen, Heinrich Spiecker

In: Trees - Structure and Function, 2012, 26(4), 1267-1274

We studied intra-annual wood density profiles of Douglas-fir tree rings (Pseudotsuga menziesii [Mirb.] Franco) in southwestern Germany. Growth rate varies differently over time throughout the circumference of trees. This leads to differences in wood formation, which can be observed in the shape of the density profiles of the same tree ring measured in different radial directions. Due to this spatial variation in density profiles, we need a reliable method to determine an average profile, which preserves the common characteristics of the data. To this end, we developed a multiple interval-based curve alignment (MICA) procedure. It identifies characteristic points within the profiles such as minima, maxima and inflection points. These reference points are shifted gradually against each other within a proportionally defined base line interval. Using our progressive alignment approach, we are able to calculate an average profile that represents very well the characteristics of all measured curves of a specific tree ring. We applied the procedure to get year-specific average profiles using various trees. This results in representative mean density profiles that preserves the density variations common to all aligned profiles. Individual noise is reduced thereby enabling the analysis of the impact of weather variations on wood density.

Publication note:
BB and MM contributed equally to this work.

Available:
pdf (1081 KB)   doi:10.1007/s00468-012-0702-y   BibTeX Entry ( Bender:12 )

LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs

Sebastian Will, Tejal Joshi, Ivo L. Hofacker, Peter F. Stadler, Rolf Backofen

In: RNA, 2012, 18(5), 900-14

Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used ncRNA gene finder RNAz by a factor of 3 from a median deviation of 47 to 13 nt. Post-processing RNAz predictions, LocARNA-P's STAR score allows much stronger discrimination between true- and false-positive predictions than RNAz's own evaluation. The improved accuracy, in this scenario increased from AUC 0.71 to AUC 0.87, significantly reduces the cost of successive analysis steps. The ready-to-use software tool LocARNA-P produces structure-based multiple RNA alignments with associated columnwise STARs and predicts ncRNA boundaries. We provide additional results, a web server for LocARNA/LocARNA-P, and the software package, including documentation and a pipeline for refining screens for structural ncRNA, at http://www.bioinf.uni-freiburg.de/Supplements/LocARNA-P/.

Available:
Supplement.pdf (1484 KB)   pdf (1607 KB)   doi:10.1261/rna.029041.111   pmid:22450757   BibTeX Entry ( Will_Joshi_Hofacker-LocAR_Accur_bound-2012 )

Local Exact Pattern Matching for Non-fixed RNA Structures

Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Möhl, Christina Schmiedl, Sebastian Will

In: Proceedings of the 23th Annual Symposium on Combinatorial Pattern Matching (CPM 2012), LNCS, 2012, 7354, 306-320

Detecting local common sequence-structure regions of RNAs is a biologically meaningful problem. By detecting such regions, biologists are able to identify functional similarity between the inspected molecules. We developed dynamic programming algorithms for finding common structure-sequence patterns between two RNAs. The RNAs are given by their sequence and a set of potential base pairs with associated probabilities. In contrast to prior work which matches fixed structures, we support the arc breaking edit operation; this allows to match only a subset of the given base pairs. We present an O(n3) algorithm for local exact pattern matching between two nested RNAs, and an O(n3 logn) algorithm for one nested RNA and one bounded-unlimited RNA.

Available:
pdf (650 KB)   doi:10.1007/978-3-642-31265-6_25   BibTeX Entry ( Amit-local_exact_pattern_matching-CPM2012 )

Exact Pattern Matching for RNA Structure Ensembles

Christina Schmiedl, Mathias Möhl, Steffen Heyne, Mika Amit, Gad M. Landau, Sebastian Will, Rolf Backofen

In: Proceedings of the 16th International Conference on Research in Computational Molecular Biology (RECOMB 2012), LNCS, 2012, 7262, 245-260

ExpaRNA's core algorithm computes, for two fixed RNA structures, a maximal non-overlapping set of maximal exact matchings. We introduce an algorithm ExpaRNA-P that solves the lifted problem of finding such sets of exact matchings in entire Boltzmann-distributed structure ensembles of two RNAs. Due to a novel kind of structural sparsification, the new algorithm maintains the time and space complexity of the algorithm for fixed input structures. Furthermore, we generalized the chaining algorithm of ExpaRNA in order to compute a compatible subset of ExpaRNA-P's exact matchings.We show that ExpaRNA-P outperforms ExpaRNA in BRAliBase 2.1 benchmarks, where we pass the chained exact matchings as anchor constraints to the RNA alignment tool LocARNA. Compared to LocARNA, this novel approach shows similar accuracy but is six times faster.

Available:
pdf (1799 KB)   doi:10.1007/978-3-642-29627-7_27   BibTeX Entry ( Schmiedl:etal:_exparnap:RECOMB2012 )

Structure-based Whole Genome Realignment Reveals Many Novel Non-coding RNAs

Sebastian Will, Michael Yu, Bonnie Berger

In: Proceedings of the 16th International Conference on Research in Computational Molecular Biology (RECOMB 2012), LNCS, 2012, 7262, 341

Recent genome-wide computational screens that search for conservation of RNA secondary structure in whole genome alignments (WGAs) have predicted thousands of structural non-coding RNAs (ncRNAs). The sensitivity of such approaches, however, is limited due to their reliance on sequence-based whole- genome aligners, which regularly misalign structural ncRNAs. This suggests that many more structural ncRNAs may remain undetected. Structure-based align- ment, which could increase the sensitivity, has been prohibitive for genome- wide screens due to its extreme computational costs. Breaking this barrier, we present the pipeline REAPR (RE-Alignment for de novo Prediction of structural ncRNA) that realigns whole genomes based on RNA sequence and structure and then evaluates the realignments for potential structural ncRNAs with a ncRNA predictor such as RNAz 2.0. For efficiency of the pipeline, we develop a novel banding realignment algorithm for the RNA multiple alignment tool LocARNA. This allows us to perform very fast structure-based realignment within limited deviation of the original multiple alignment from the WGA. We apply REAPR to the complete twelve Drosophila WGAs to predict ncRNAs across all these Drosophila species. Compared to direct prediction from the original WGA at the same False Discovery Rate (FDR), we predict twice as many high-confidence ncRNA candidates in D.melanogaster while less than doubling the run-time. As a novelty in ncRNA prediction, we control the FDR, going beyond the usual a posteriori FDR estimation. Applying the sequence-based alignment tool Muscle for realignment, we demonstrate that structure-based methods are necessary for effective prediction of originally misaligned ncRNAs. Comparing to recent screens of Drosophila and ENCODE we show that REAPR outperforms the widely-used de novo predictors RNAz, EvoFold, and CMfinder. Finally, we reveal, with high confidence, a putative structural motif in the long ncRNA roX1 of D.melanogaster, known to regulate X chromosome dosage compensation in male ies. Interestingly, we recapitulate the Drosophila phylogeny, based on co-predicted ncRNAs across all genomes.

Available:
pdf (98 KB)   doi:10.1007/978-3-642-29627-7_35   BibTeX Entry ( Will:Yu:Berger:_structure_based:RECOMB2012 )

The SH2-domain of SHIP1 interacts with the SHIP1 C-terminus: Impact on SHIP1/Ig-alpha interaction

Oindrilla Mukherjee, Lars Weingarten, Inken Padberg, Catrin Pracht, Rileen Sinha, Thomas Hochdorfer, Stephan Kuppig, Rolf Backofen, Michael Reth, Michael Huber

In: Biochim Biophys Acta, 2012, 1823(2), 206-14

The SH2-containing inositol 5'-phosphatase, SHIP1, negatively regulates signal transduction from the B cell antigen receptor (BCR). The mode of coupling between SHIP1 and the BCR has not been elucidated so far. In comparison to wild-type cells, B cells expressing a mutant IgD- or IgM-BCR containing a C-terminally truncated Ig-alpha respond to pervanadate stimulation with markedly reduced tyrosine phosphorylation of SHIP1 and augmented activation of protein kinase B. This indicates that SHIP1 is capable of interacting with the C-terminus of Ig-alpha. Employing a system of fluorescence resonance energy transfer in S2 cells, we can clearly demonstrate interaction between the SH2-domain of SHIP1 and Ig-alpha. Furthermore, a fluorescently labeled SH2-domain of SHIP1 translocates to the plasma membrane in an Ig-alpha-dependent manner. Interestingly, whereas the SHIP1 SH2-domain can be pulled-down with phospho-peptides corresponding to the immunoreceptor tyrosine-based activation motif (ITAM) of Ig-alpha from detergent lysates, no interaction between full-length SHIP1 and the phosphorylated Ig-alpha ITAM can be observed. Further studies show that the SH2-domain of SHIP1 can bind to the C-terminus of the SHIP1 molecule, most probably by inter- as well as intra-molecular means, and that this interaction regulates the association between different forms of SHIP1 and Ig-alpha.

Available:
doi:10.1016/j.bbamcr.2011.11.019   pmid:22182704   BibTeX Entry ( Mukherjee:Weingarten:Padberg:The_SHIP_inter:2012 )

Data fusion of Fourier transform infrared spectra and powder X-ray diffraction patterns for pharmaceutical mixtures

Rahul V. Haware, Patrick R. Wright, Kenneth R. Morris, Mazen L. Hamad

In: J Pharm Biomed Anal, 2011, 56(5), 944-9

Fusing complex data from two disparate sources has been demonstrated to improve the accuracy in quantifying active ingredients in mixtures of pharmaceutical powders. A four-component simplex-centroid design was used to prepare blended powder mixtures of acetaminophen, caffeine, aspirin and ibuprofen. The blends were analyzed by Fourier transform infra-red spectroscopy (FTIR) and powder X-ray diffraction (PXRD). The FTIR and PXRD data were preprocessed and combined using two different data fusion methods: fusion of preprocessed data (FPD) and fusion of principal component scores (FPCS). A partial least square (PLS) model built on the FPD did not improve the root mean square error of prediction. However, a PLS model built on the FPCS yielded better accuracy prediction than PLS models built on individual FTIR and PXRD data sets. The improvement in prediction accuracy of the FPCS may be attributed to the removal of noise and data reduction associated with using PCA as a preprocessing tool. The present approach demonstrates the usefulness of data fusion for the information management of large data sets from disparate sources.

Available:
pdf (1286 KB)   doi:10.1016/j.jpba.2011.08.018   pmid:21873013   BibTeX Entry ( Haware_Wright_Morris-Data_fusio_Fouri-2011 )

Molecular evolution of the non-coding eosinophil granule ontogeny transcript

Dominic Rose, Peter F. Stadler

In: Front Genet, 2011, 2, 69

Eukaryotic genomes are pervasively transcribed. A large fraction of the transcriptional output consists of long, mRNA-like, non-protein-coding transcripts (mlncRNAs). The evolutionary history of mlncRNAs is still largely uncharted territory. In this contribution, we explore in detail the evolutionary traces of the eosinophil granule ontogeny transcript (EGOT), an experimentally confirmed representative of an abundant class of totally intronic non-coding transcripts (TINs). EGOT is located antisense to an intron of the ITPR1 gene. We computationally identify putative EGOT orthologs in the genomes of 32 different amniotes, including orthologs from primates, rodents, ungulates, carnivores, afrotherians, and xenarthrans, as well as putative candidates from basal amniotes, such as opossum or platypus. We investigate the EGOT gene phylogeny, analyze patterns of sequence conservation, and the evolutionary conservation of the EGOT gene structure. We show that EGO-B, the spliced isoform, may be present throughout the placental mammals, but most likely dates back even further. We demonstrate here for the first time that the whole EGOT locus is highly structured, containing several evolutionary conserved, and thermodynamic stable secondary structures. Our analyses allow us to postulate novel functional roles of a hitherto poorly understood region at the intron of EGO-B which is highly conserved at the sequence level. The region contains a novel ITPR1 exon and also conserved RNA secondary structures together with a conserved TATA-like element, which putatively acts as a promoter of an independent regulatory element.

Available:
pdf (2020 KB)   doi:10.3389/fgene.2011.00069   pmid:22303364   BibTeX Entry ( Rose:Stadler:Molec_evolu_the:2011 )

Accessibility and conservation in bacterial small RNA-mRNA interactions and implications for genome-wide target predictions

Andreas S. Richter, Rolf Backofen

In: Proceedings of the German Conference on Bioinformatics (GCB 2011), 2011

Bacterial small RNAs (sRNAs) are a class of structural RNAs that often regulate mRNA targets via post-transcriptional base pair interactions. In this study, we assessed the accessibility and conservation of the interaction sites and the influence of these features on genome-wide target predictions. For this purpose, we compiled a set of 71 experimentally verified sRNA-target pairs from Escherichia coli and Salmonella, and collected genome-wide information on 5' untranslated regions of annotated genes. Then, features of the confirmed interactions were compared to a set of non-interactions. We found that the interaction sites both in sRNA and target are more accessible than in the negative data. Furthermore, interaction sites in the sRNAs, but not in the targets, show high sequence conservation. The base pairing between sRNA and target was not found to be generally conserved across more distantly related species. We then present two approaches to constrain the region of interaction initiation to (1) well-accessible regions in both interaction partners or (2) unstructured conserved sRNA regions derived from reliability profiles of multiple sRNA alignments. Using these constraints, genome-wide target predictions were improved in terms of specificity.

Available:
pdf (244 KB)   supplement.pdf (89 KB)   BibTeX Entry ( Richter:Backofen:Acces_conse_bacte:2011 )

Sparsification in Algebraic Dynamic Programming

Mathias Möhl, Christina Schmiedl, Shay Zakov

In: Proceedings of the German Conference on Bioinformatics (GCB 2011), 2011

Sparsification is a technique to speed up dynamic programming algorithms which has been suc- cessfully applied to RNA structure prediction, RNA-RNA-interaction prediction, simultaneous align- ment and folding, and pseudoknot prediction. So far, sparsification has been more a collection of loosely related examples and no general, well understood theory. In this work we propose a general theory to describe and implement sparsification in dynamic programming algorithms. The approach is formalized as an extension of Algebraic Dynamic Programming (ADP) which makes it applicable to a variety of algorithms and scoring schemes. In particular, this is the first approach that shows how to sparsify algorithms with scoring schemes that go beyond simple minimization or maximiza- tion, like enumeration of suboptimal solutions and approximation of the partition function. As an example, we show how to sparsify different variants of RNA structure prediction algorithms.

Available:
pdf (132 KB)   BibTeX Entry ( Moehl:Schmiedl:Zakov:SparsificationADP:2011 )

Sparse RNA folding: Time and space efficient algorithms

Rolf Backofen, Dekel Tsur, Shay Zakov, Michal Ziv-Ukelson

In: J. Discrete Algorithms, 2011, 9(1), 12-31

The currently fastest algorithm for RNA Single Strand Folding requires O(nZ) time and Θ(n2) space, where n denotes the length of the input string and Z is a sparsity parameter satisfying n <= Z < n2. We show how to reduce the time and space complexities of this algorithm in the sparse case. The space reduction is based on the observation that some solutions for sub-instances are not examined after a certain stage of the algorithm, and may be discarded from memory. The running time speed up is achieved by combining two independent sparsification criteria, which restrict the number of expressions that need to be examined in bottleneck computations of the algorithm. This yields an O(n2 + P Z) time and Θ(Z) space algorithm, where P is a sparsity parameter satisfying P < n <= Z <= n(P +1). For the base-pairing maximization variant, the time complexity is further reduced to O(LZ), where L denotes the maximum number of base-pairs in a folding of the input string and satisfies L <=n\2. The presented techniques also extend to the related RNA Simultaneous Alignment and Folding problem. For an input composed of two strings of lengths n and m, the time and space complexities are reduced from O(nm˜Z) and Θ(n2m2) down to O(n2m2 + ˜P ˜Z) and Θ(nm2 + ˜Z ) respectively, where ˜Z and ˜P are sparsity parameters satisfying ˜P

Available:
pdf (1256 KB)   BibTeX Entry ( backofen11:_spars_rna )

Structator: fast index-based search for RNA sequence-structure patterns

Fernando Meyer, Stefan Kurtz, Rolf Backofen, Sebastian Will, Michael Beckstette

In: BMC Bioinformatics, 2011, 12(1), 214

ABSTRACT: BACKGROUND: The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. RESULTS: We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. CONCLUSIONS: The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator.

Available:
pdf (338 KB)   doi:10.1186/1471-2105-12-214   pmid:21619640   BibTeX Entry ( Meyer:Kurtz:Backofen:Struc_fast_index:2011 )

Computational discovery of human coding and non-coding transcripts with conserved splice sites

Dominic Rose, Michael Hiller, Katharina Schutt, Jorg Hackermuller, Rolf Backofen, Peter F. Stadler

In: Bioinformatics, 2011, 27(14), 1894-900

MOTIVATION: Long non-coding RNAs (lncRNAs) resemble protein-coding mRNAs but do not encode proteins. Most lncRNAs are under lower sequence constraints than protein-coding genes and lack conserved secondary structures, making it hard to predict them computationally. RESULTS: We introduce an approach to predict spliced lncRNAs in vertebrate genomes combining comparative genomics and machine learning. It is based on detecting signatures of characteristic splice site evolution in vertebrate whole genome alignments. First, we predict individual splice sites, then assemble compatible sites into exon candidates, and finally predict multi-exon transcripts. Using a novel method to evaluate typical splice site substitution patterns that explicitly takes the species phylogeny into account, we show that individual splice sites can be accurately predicted. Since our approach relies only on predicted splice sites, it can uncover both coding and non-coding exons. We show that our predicted exons and partial transcripts are mostly non-coding and lack conserved secondary structures. These exons are of particular interest, since existing computational approaches cannot detect them. Transcriptome sequencing data indicate tissue-specific expression patterns of predicted exons and there is evidence that increasing sequencing depth and breadth will validate additional predictions. We also found a significant enrichment of predicted exons that form multi-exon transcript parts, and we experimentally validate such a novel multi-exon gene. Overall, we obtain 336 novel multi-exon transcript predictions from human intergenic regions. Our results indicate the existence of novel human transcripts that are conserved in evolution and our approach contributes to the completion of the human transcript catalog. AVAILABILITY AND IMPLEMENTATION: Predicted human splice sites, exons and gene structures together with a Perl implementation of the tree-based log-odds scoring and a supplementary PDF file containing additional figures and tables are available at: http://www.bioinf.uni-leipzig.de/publications/supplements/10-010. The five experimentally confirmed partial transcript isoforms have been deposited in GenBank under accession numbers HM587422-HM587426.

Available:
pdf (874 KB)   doi:10.1093/bioinformatics/btr314   pmid:21622663   BibTeX Entry ( Rose:Hiller:Schutt:Compu_disco_human:2011 )

The PETfold and PETcofold web servers for intra- and intermolecular structures of multiple RNA sequences

S. E. Seemann, P. Menzel, R. Backofen, J. Gorodkin

In: Nucleic Acids Research, 2011, 39, W107-11

The function of non-coding RNA genes largely depends on their secondary structure and the interaction with other molecules. Thus, an accurate prediction of secondary structure and RNA-RNA interaction is essential for the understanding of biological roles and pathways associated with a specific RNA gene. We present web servers to analyze multiple RNA sequences for common RNA structure and for RNA interaction sites. The web servers are based on the recent PET (Probabilistic Evolutionary and Thermodynamic) models PETfold and PETcofold, but add user friendly features ranging from a graphical layer to interactive usage of the predictors. Additionally, the web servers provide direct access to annotated RNA alignments, such as the Rfam 10.0 database and multiple alignments of 16 vertebrate genomes with human. The web servers are freely available at: http://rth.dk/resources/petfold/

Available:
pdf (659 KB)   doi:10.1093/nar/gkr248   pmid:21609960   BibTeX Entry ( Seemann:Menzel:Backofen:The_PETfo_and:NAR2011 )

The small RNA PhrS stimulates synthesis of the Pseudomonas aeruginosa quinolone signal

Elisabeth Sonnleitner, Nicolas Gonzalez, Theresa Sorger-Domenigg, Stephan Heeb, Andreas S. Richter, Rolf Backofen, Paul Williams, Alexander Huttenhofer, Dieter Haas, Udo Blasi

In: Mol Microbiol, 2011, 80(4), 868-85

Quorum sensing, a cell-to-cell communication system based on small signal molecules, is employed by the human pathogen Pseudomonas aeruginosa to regulate virulence and biofilm development. Moreover, regulation by small trans-encoded RNAs has become a focal issue in studies of virulence gene expression of bacterial pathogens. In this study, we have identified the small RNA PhrS as an activator of PqsR synthesis, one of the key quorum-sensing regulators in P. aeruginosa. Genetic studies revealed a novel mode of regulation by a sRNA, whereby PhrS uses a base-pairing mechanism to activate a short upstream open reading frame to which the pqsR gene is translationally coupled. Expression of phrS requires the oxygen-responsive regulator ANR. Thus, PhrS is the first bacterial sRNA that provides a regulatory link between oxygen availability and quorum sensing, which may impact on oxygen-limited growth in P. aeruginosa biofilms.

Available:
doi:10.1111/j.1365-2958.2011.07620.x   pmid:21375594   BibTeX Entry ( Sonnleitner:Gonzalez:Sorger-Domenigg:The_small_RNA:2011 )

Signatures of Co-translational Folding

Rhodri Saunders, Martin Mann, Charlotte Deane

In: Biotechnology Journal, Special issue: Protein folding in vivo, March 2011, 6(6), 742-751

Global and co-translational protein folding may both occur in vivo, and understanding the relationship between these folding mechanisms is pivotal to our understanding of protein structure formation. Within this study, over 1.5 million hydrophobic-polar sequences were classified based on their ability to attain a unique but not necessarily minimal energy conformation via co-translational folding. The sequence and structure properties of the sets were then compared to elucidate signatures of co-translational folding. The strongest signature of co-translational folding is a reduced number of possible favourable contacts in the amino-terminus. There is no evidence of fewer contacts, more local contacts, nor less compact structures. Co-translational folding does produce a more compact amino- than carboxy-terminal region and an amino-terminal biased set of core residues. In real proteins these signatures are also observed and found most strongly in proteins of the SCOP alpha/beta class where 71% have an amino-terminal set of core residues. The prominence of co-translational features in experimentally determined protein structures suggests that the importance of co-translational folding is currently underestimated.

Publication note:
RS and MM have contributed equally to this work.

Available:
pdf (388 KB)   supplement.pdf (417 KB)   doi:10.1002/biot.201000330   BibTeX Entry ( SaundersMann_11 )

Efficient exploration of discrete energy landscapes

Martin Mann, Konstantin Klemm

In: Phys. Rev. E, January 2011, 83(1), online

Many physical and chemical processes, such as folding of biopolymers, are best described as dynamics on large combinatorial energy landscapes. A concise approximate description of the dynamics is obtained by partitioning the micro-states of the landscape into macro-states. Since most landscapes of interest are not tractable analytically, the probabilities of transitions between macro-states need to be extracted numerically from the microscopic ones, typically by full enumeration of the state space or approximations using the Arrhenius law. Here we propose to approximate transition probabilities by a Markov chain Monte-Carlo method. For landscapes of the number partitioning problem and an RNA switch molecule we show that the method allows for accurate probability estimates with significantly reduced computational cost.

Available:
pdf (329 KB)   doi:10.1103/PhysRevE.83.011113   arXiv:0910.2559   BibTeX Entry ( Mann:Klemm:11 )

PETcofold: predicting conserved interactions and structures of two multiple alignments of RNA sequences

Stefan E. Seemann, Andreas S. Richter, Tanja Gesell, Rolf Backofen, Jan Gorodkin

In: Bioinformatics, 2011, 27(2), 211-219

MOTIVATION: Predicting RNA-RNA interactions is essential for determining the function of putative non-coding RNAs. Existing methods for the prediction of interactions are all based on single sequences. Since comparative methods have already been useful in RNA structure determination, we assume that conserved RNA-RNA interactions also imply conserved function. Of these, we further assume that a non-negligible amount of the existing RNA-RNA interactions have also acquired compensating base changes throughout evolution. We implement a method, PETcofold, that can take covariance information in intra-molecular and inter-molecular base pairs into account to predict interactions and secondary structures of two multiple alignments of RNA sequences. RESULTS: PETcofold's ability to predict RNA-RNA interactions was evaluated on a carefully curated dataset of 32 bacterial small RNAs and their targets, which was manually extracted from the literature. For evaluation of both RNA-RNA interaction and structure prediction, we were able to extract only a few high-quality examples: one vertebrate small nucleolar RNA and four bacterial small RNAs. For these we show that the prediction can be improved by our comparative approach. Furthermore, PETcofold was evaluated on controlled data with phylogenetically simulated sequences enriched for covariance patterns at the interaction sites. We observed increased performance with increased amounts of covariance. AVAILABILITY: The program PETcofold is available as source code and can be downloaded from http://rth.dk/resources/petcofold. CONTACT: gorodkin@rth.dk; backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
supplement.pdf (908 KB)   pdf (525 KB)   doi:10.1093/bioinformatics/btq634   pmid:21088024   BibTeX Entry ( Seemann:Richter:Gesell:PETco_predi_conse:2011 )

Fast RNA Structure Alignment for Crossing Input Structures

Rolf Backofen, Gad M. Landau, Mathias Möhl, Dekel Tsur, Oren Weimann

In: Journal of Discrete Algorithms, Gregory Kucherov, Esko Ukkonen, 2011, 9(1), 2-11

The complexity of pairwise RNA structure alignment depends on the structural restrictions assumed for both the input structures and the computed consensus structure. For arbitrarily crossing input and consensus structures, the problem is NP-hard. For non-crossing consensus structures, Jiang et al's algorithm [1] computes the alignment in O(n2 m2) time where n and m denote the lengths of the two input sequences. If also the input structures are non-crossing, the problem corresponds to tree editing which can be solved in O(m2 n(1 + log n/m )) time [2]. We present a new algorithm that solves the problem for d-crossing structures in O(dm2 n log n) time, where d is a parameter that is one for non-crossing structures, bounded by n for crossing structures, and much smaller than n on most practical examples. Crossing input structures allow for applications where the input is not a fixed structure but is given as base-pair probability matrices. Keywords : RNA, sequence structure alignment, simultaneous alignment and folding

Available:
pdf (589 KB)   BibTeX Entry ( cpm09j )

Identification of functional elements and regulatory circuits by Drosophila modENCODE

Sushmita Roy, Jason Ernst, Peter V. Kharchenko, Pouya Kheradpour, Nicolas Negre, Matthew L. Eaton, Jane M. Landolin, Christopher A. Bristow, Lijia Ma, Michael F. Lin, Stefan Washietl, Bradley I. Arshinoff, Ferhat Ay, Patrick E. Meyer, Nicolas Robine, Nicole L. Washington, Luisa Di Stefano, Eugene Berezikov, Christopher D. Brown, Rogerio Candeias, Joseph W. Carlson, Adrian Carr, Irwin Jungreis, Daniel Marbach, Rachel Sealfon, Michael Y. Tolstorukov, Sebastian Will, Artyom A. Alekseyenko, Carlo Artieri, Benjamin W. Booth, Angela N. Brooks, Qi Dai, Carrie A. Davis, Michael O. Duff, Xin Feng, Andrey A. Gorchakov, Tingting Gu, Jorja G. Henikoff, Philipp Kapranov, Renhua Li, Heather K. MacAlpine, John Malone, Aki Minoda, Jared Nordman, Katsutomo Okamura, Marc Perry, Sara K. Powell, Nicole C. Riddle, Akiko Sakai, Anastasia Samsonova, Jeremy E. Sandler, Yuri B. Schwartz, Noa Sher, Rebecca Spokony, David Sturgill, Marijke van Baren, Kenneth H. Wan, Li Yang, Charles Yu, Elise Feingold, Peter Good, Mark Guyer, Rebecca Lowdon, Kami Ahmad, Justen Andrews, Bonnie Berger, Steven E. Brenner, Michael R. Brent, Lucy Cherbas, Sarah C. R. Elgin, Thomas R. Gingeras, Robert Grossman, Roger A. Hoskins, Thomas C. Kaufman, William Kent, Mitzi I. Kuroda, Terry Orr-Weaver, Norbert Perrimon, Vincenzo Pirrotta, James W. Posakony, Bing Ren, Steven Russell, Peter Cherbas, Brenton R. Graveley, Suzanna Lewis, Gos Micklem, Brian Oliver, Peter J. Park, Susan E. Celniker, Steven Henikoff, Gary H. Karpen, Eric C. Lai, David M. MacAlpine, Lincoln D. Stein, Kevin P. White, Manolis Kellis

In: Science, 2010, 330(6012), 1787-97

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.

Available:
pdf (4990 KB)   doi:10.1126/science.1198374   pmid:21177974   BibTeX Entry ( Roy:Ernst:Kharchenko:Ident_funct_eleme:2010 )

Sparsification of RNA structure prediction including pseudoknots

Mathias Möhl, Raheleh Salari, Sebastian Will, Rolf Backofen, S. Cenk Sahinalp

In: Algorithms Mol Biol, 2010, 5(1), 39

ABSTRACT: BACKGROUND: Although many RNA molecules contain pseudoknots, computational prediction of pseudoknotted RNA structure is still in its infancy due to high running time and space consumption implied by the dynamic programming formulations of the problem. RESULTS: We introduce sparsification to significantly speedup the dynamic programming approaches for pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification has been applied to a number of RNA-related structure prediction problems in the past few years, we provide the first application of sparsification to pseudoknotted RNA structure prediction specifically and to handling gapped fragments more generally - which has a much more complex recursive structure than other problems to which sparsification has been applied. We analyse how to sparsify four pseudoknot structure prediction algorithms, among those the most general method available (the Rivas-Eddy algorithm) and the fastest one (Reeder-Giegerich algorithm). In all algorithms the number of "candidate" substructures to be considered is reduced. CONCLUSION: Experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup over the unsparsified implementation.

Available:
pdf (617 KB)   doi:10.1186/1748-7188-5-39   pmid:21194463   BibTeX Entry ( Moehl:Sparsification:AMB2011 )

Partitioning biological data with transitivity clustering

Tobias Wittkop, Dorothea Emig, Sita Lange, Sven Rahmann, Mario Albrecht, John H. Morris, Sebastian Bocker, Jens Stoye, Jan Baumbach

In: Nat Methods, 2010, 7(6), 419-20

No Abstract available

Available:
pdf (195 KB)   med (1 KB)   doi:10.1038/nmeth0610-419   pmid:20508635   BibTeX Entry ( Wittkop:Emig:Lange:Parti_biolo_data:2010 )

Identifikation von Targetmolekülen kleiner regulatorischer RNAs

Rolf Backofen

In: Laborwelt, 2010, 11(1), 29-30

Nicht-codierende, kleine RNAs (sRNAs) spielen eine wichtige Rolle in der Genregulation. Studien an prokaryotischen und eukaryotischen Zellen zeigen, dass diese sRNAs häufig über die Paarung komplementärer Basen an mRNA-Zielsequenzen (ta-mRNA) binden, um die Expression bzw. Translation der entsprechenden Gene zu regulieren. Die Spezifität dieser Interaktionen hängt von der Stabilität der inter- und intramolekularen Basenpaarungen ab. Neue Techniken wie das Deep Sequencing identifizieren zwar extrem viele sRNAs. Da aber keine Hochdurchsatzmethoden zur Verfügung stehen, um die zugehörigen Target-RNAs zu identifizieren und somit den sRNAs eine Funktion zuzuschreiben, und weil die experimentellen Methoden mit grossem Aufwand verbunden sind, ist die rechnergestützte Target-Vorhersage, gefolgt von einer experimentellen Analyse vielversprechender Kandidaten der derzeit effizienteste Ansatz, Ziel-RNAs zu identifizieren. Allerdings ist auch die rechnergestützte Vorhersage von Targetsequenzen aufgrund der häufig unvollständigen Sequenzkomplementarität oft eine Herausforderung. Hier beschreiben wir derzeit aktuelle Methoden zur Targetsequenz-Vorhersage.

Available:
BibTeX Entry ( backofen10:_ident_von_klein_regul_rnas )
Links:
External PDF (starts at page 29)

Evolution of Metabolic Networks: A Computational Framework

Christoph Flamm, Alexander Ullrich, Heinz Ekker, Martin Mann, Daniel Hoegerl, Markus Rohrschneider, Sebastian Sauer, Gerik Scheuermann, Konstantin Klemm, Ivo L. Hofacker, Peter F. Stadler

In: Journal of Systems Chemistry, 2010, 1(1), 4

Background: The metabolic architectures of extant organisms share many key pathways such as the citric acid cycle, glycolysis, or the biosynthesis of most amino acids. Several competing hypotheses for the evolutionary mechanisms that shape metabolic networks have been discussed in the literature, each of which finds support from comparative analysis of extant genomes. Alternatively, the principles of metabolic evolution can be studied by direct computer simulation. This requires, however, an explicit implementation of all pertinent components: a universe of chemical reaction upon which the metabolism is built, an explicit representation of the enzymes that implement the metabolism, of a genetic system that encodes these enzymes, and of a fitness function that can be selected for. Results: We describe here a simulation environment that implements all these components in a simplified ways so that large-scale evolutionary studies are feasible. We employ an artificial chemistry that views chemical reactions as graph rewriting operations and utilizes a toy-version of quantum chemistry to derive thermodynamic parameters. Minimalist organisms with simple string encoded genomes produce model ribozymes whose catalytic activity is determined by an ad hoc mapping between their secondary structure and the transition state graphs that they stabilize. Fitness is computed utilizing the ideas of metabolic flux analysis. We present an implementation of the complete system and first simulation results. Conclusions: The simulation system presented here allows coherent investigations into the evolutionary mechanisms of the first steps of metabolic evolution using a self-consistent toy universe.

Available:
pdf (974 KB)   doi:10.1186/1759-2208-1-4   BibTeX Entry ( Flamm_etal_10 )

Identification and characterization of NAGNAG alternative splicing in the moss Physcomitrella patens

Rileen Sinha, Andreas D. Zimmer, Kathrin Bolte, Daniel Lang, Ralf Reski, Matthias Platzer, Stefan A. Rensing, Rolf Backofen

In: BMC Plant Biol, 2010, 10, 76

BACKGROUND: Alternative splicing (AS) involving tandem acceptors that are separated by three nucleotides (NAGNAG) is an evolutionarily widespread class of AS, which is well studied in Homo sapiens (human) and Mus musculus (mouse). It has also been shown to be common in the model seed plants Arabidopsis thaliana and Oryza sativa (rice). In one of the first studies involving sequence-based prediction of AS in plants, we performed a genome-wide identification and characterization of NAGNAG AS in the model plant Physcomitrella patens, a moss. RESULTS: Using Sanger data, we found 295 alternatively used NAGNAG acceptors in P. patens. Using 31 features and training and test datasets of constitutive and alternative NAGNAGs, we trained a classifier to predict the splicing outcome at NAGNAG tandem splice sites (alternative splicing, constitutive at the first acceptor, or constitutive at the second acceptor). Our classifier achieved a balanced specificity and sensitivity of >or= 89%. Subsequently, a classifier trained exclusively on data well supported by transcript evidence was used to make genome-wide predictions of NAGNAG splicing outcomes. By generation of more transcript evidence from a next-generation sequencing platform (Roche 454), we found additional evidence for NAGNAG AS, with altogether 664 alternative NAGNAGs being detected in P. patens using all currently available transcript evidence. The 454 data also enabled us to validate the predictions of the classifier, with 64% (80/125) of the well-supported cases of AS being predicted correctly. CONCLUSION: NAGNAG AS is just as common in the moss P. patens as it is in the seed plants A. thaliana and O. sativa (but not conserved on the level of orthologous introns), and can be predicted with high accuracy. The most informative features are the nucleotides in the NAGNAG and in its immediate vicinity, along with the splice sites scores, as found earlier for NAGNAG AS in animals. Our results suggest that the mechanism behind NAGNAG AS in plants is similar to that in animals and is largely dependent on the splice site and its immediate neighborhood.

Available:
pdf (865 KB)   doi:10.1186/1471-2229-10-76   pmid:20426810   BibTeX Entry ( Sinha:Zimmer:Bolte:Ident_and_chara:2010 )

TassDB2 - A comprehensive database of subtle alternative splicing events

Rileen Sinha, Thorsten Lenser, Niels Jahn, Ulrike Gausmann, Swetlana Friedel, Karol Szafranski, Klaus Huse, Philip Rosenstiel, Jochen Hampe, Stefan Schuster, Michael Hiller, Rolf Backofen, Matthias Platzer

In: BMC Bioinformatics, 2010, 11, 216

BACKGROUND: Subtle alternative splicing events involving tandem splice sites separated by a short (2-12 nucleotides) distance are frequent and evolutionarily widespread in eukaryotes, and a major contributor to the complexity of transcriptomes and proteomes. However, these events have been either omitted altogether in databases on alternative splicing, or only the cases of experimentally confirmed alternative splicing have been reported. Thus, a database which covers all confirmed cases of subtle alternative splicing as well as the numerous putative tandem splice sites (which might be confirmed once more transcript data becomes available), and allows to search for tandem splice sites with specific features and download the results, is a valuable resource for targeted experimental studies and large-scale bioinformatics analyses of tandem splice sites. Towards this goal we recently set up TassDB (Tandem Splice Site DataBase, version 1), which stores data about alternative splicing events at tandem splice sites separated by 3 nt in eight species. DESCRIPTION: We have substantially revised and extended TassDB. The currently available version 2 contains extensive information about tandem splice sites separated by 2-12 nt for the human and mouse transcriptomes including data on the conservation of the tandem motifs in five vertebrates. TassDB2 offers a user-friendly interface to search for specific genes or for genes containing tandem splice sites with specific features as well as the possibility to download result datasets. For example, users can search for cases of alternative splicing where the proportion of EST/mRNA evidence supporting the minor isoform exceeds a specific threshold, or where the difference in splice site scores is specified by the user. The predicted impact of each event on the protein is also reported, along with information about being a putative target for the nonsense-mediated decay (NMD) pathway. Links are provided to the UCSC genome browser and other external resources. CONCLUSION: TassDB2, available via http://www.tassdb.info, provides comprehensive resources for researchers interested in both targeted experimental studies and large-scale bioinformatics analyses of short distance tandem splice sites.

Available:
pdf (1061 KB)   doi:10.1186/1471-2105-11-216   pmid:20429909   BibTeX Entry ( Sinha:Lenser:Jahn:TassD_compr_datab:2010 )

Shape-based barrier estimation for RNAs

Sergiy Bogomolov, Martin Mann, Björn Voss, Andreas Podelski, Rolf Backofen

In: In Proceedings of German Conference on Bioinformatics GCB'10, LNI, 2010, 173, 42-51

The ability of some RNA molecules to switch between different metastable conformations plays an important role in cellular processes. In order to identify such molecules and to predict their conformational changes one has to investigate the refolding pathways. As a qualitative measure of these transitions, the barrier height marks the energy peak along such refolding paths. We introduce a meta-heuristic to estimate such barriers, which is an NP-complete problem. To guide an arbitrary path heuristic, the method uses RNA shape representative structures as intermediate checkpoints for detours. This enables a broad but estimationcient search for refolding pathways. The resulting Shape Triples meta-heuristic enables a close to optimal estimation of the barrier height that outperforms the precision of the employed path heuristic.

Publication note:
SB and MM contributed equally to this work.

Available:
pdf (516 KB)   BibTeX Entry ( Bogomolov:shapeTriples:2010 )

Sparsification of RNA Structure Prediction Including Pseudoknots

Mathias Möhl, Raheleh Salari, Sebastian Will, Rolf Backofen, S. Cenk Sahinalp

In: Vincent Moulton, Mona Singh, Proc. of the 10th Workshop on Algorithms in Bioinformatics (WABI), Lecture Notes in Computer Science, 2010, 6293, 40-51

Although many RNA molecules contain pseudoknots, computational prediction of pseudoknotted RNA structure is still in its infancy due to high running time and space consumption implied by the dynamic programming formulations of the problem. In this paper, we introduce sparsification to significantly speedup the dynamic programming approaches for pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification has been applied to a number of RNA-related structure prediction problems in the past few years, we provide the first application of sparsification to pseudoknotted RNA structure prediction specifically and to handling gapped fragments more generally - which has a much more complex recursive structure than other problems to which sparsification has been applied. We show that sparsification, when applied to the fastest, as well as the most general pseudoknotted structure prediction methods available, - respectively the Reeder-Giegerich algorithm and the Rivas-Eddy algorithm - reduces the number of candidate substructures to be considered significantly. In fact, experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup over the unsparsified implementation.

Available:
pdf (683 KB)   doi:10.1007/978-3-642-15294-8_4   BibTeX Entry ( moehl_wabi10:Sparsification )

A Propagator for Maximum Weight String Alignment with Arbitrary Pairwise Dependencies

Alessandro Dal Palu, Mathias Möhl, Sebastian Will

In: Proceedings of the 16th International Conference on Principles and Practice of Constraint Programming (CP-2010), 2010, 8

The optimization of weighted string alignments is a well studied problem recurring in a number of application domains and can be solved efficiently. The problem becomes MAX-SNP-hard as soon as arbitrary pairwise dependencies among the alignment edges are introduced. We present a global propagator for this problem which is based on efficiently solving a relaxation of it. In the context of bioinformatics, the problem is known as alignment of arc-annotated sequences, which is e.g. used for comparing RNA molecules. For a restricted version of this alignment problem, we show that a constraint program based on our propagator is on par with state of the art methods. For the general problem with unrestricted dependencies, our tool constitutes the first available method with promising applications in this field.

Available:
pdf (160 KB)   BibTeX Entry ( palu_moehl_will_cp2010 )

The small RNA Aar in Acinetobacter baylyi: a putative regulator of amino acid metabolism

Dominik Schilling, Sven Findeiss, Andreas S. Richter, Jennifer A. Taylor, Ulrike Gerischer

In: Arch Microbiol, 2010, 192(9), 691-702

Small non-coding RNAs (sRNAs) are key players in prokaryotic metabolic circuits, allowing the cell to adapt to changing environmental conditions. Regulatory interference by sRNAs in cellular metabolism is often facilitated by the Sm-like protein Hfq. A search for novel sRNAs in A. baylyi intergenic regions was performed by a biocomputational screening. One candidate, Aar, encoded between trpS and sucD showed Hfq dependency in Northern blot analysis. Aar was expressed strongly during stationary growth phase in minimal medium; in contrast, in complex medium, strongest expression was in the exponential growth phase. Whereas over-expression of Aar in trans did not affect bacterial growth, seven mRNA targets predicted by two in silico approaches were upregulated in stationary growth phase. All seven mRNAs are involved in A. baylyi amino acid metabolism. A putative binding site for Lrp, the global regulator of branched-chain amino acids in E. coli, was observed within the aar gene. Both facts imply an Aar participation in amino acid metabolism.

Available:
pdf (726 KB)   doi:10.1007/s00203-010-0592-6   pmid:20559624   BibTeX Entry ( Schilling:Findeiss:Richter:The_small_RNA:2010 )

Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions

Stefan E. Seemann, Andreas S. Richter, Jan Gorodkin, Rolf Backofen

In: Algorithms Mol Biol, 2010, 5, 22

ABSTRACT: BACKGROUND: Many regulatory non-coding RNAs (ncRNAs) function through complementary binding with mRNAs or other ncRNAs, e.g., microRNAs, snoRNAs and bacterial sRNAs. Predicting these RNA interactions is essential for functional studies of putative ncRNAs or for the design of artificial RNAs. Many ncRNAs show clear signs of undergoing compensating base changes over evolutionary time. Here, we postulate that a non-negligible part of the existing RNA-RNA interactions contain preserved but covarying patterns of interactions. METHODS: We present a novel method that takes compensating base changes across the binding sites into account. The algorithm works in two steps on two pre-generated multiple alignments. In the first step, individual base pairs with high reliability are found using the PETfold algorithm, which includes evolutionary and thermodynamic properties. In step two (where high reliability base pairs from step one are constrained as unpaired), the principle of cofolding is combined with hierarchical folding. The final prediction of intra- and inter-molecular base pairs consists of the reliabilities computed from the constrained expected accuracy scoring, which is an extended version of that used for individual multiple alignments. RESULTS: We derived a rather extensive algorithm. One of the advantages of our approach (in contrast to other RNA-RNA interaction prediction methods) is the application of covariance detection and prediction of pseudoknots between intra- and inter-molecular base pairs. As a proof of concept, we show an example and discuss the strengths and weaknesses of the approach.

Available:
supplement.pdf (70 KB)   pdf (6136 KB)   doi:10.1186/1748-7188-5-22   pmid:20492641   BibTeX Entry ( Seemann:Richter:Gorodkin:Hiera_foldi_multi:2010 )

Lattice model refinement of protein structures

Martin Mann, Alessandro Dal Palu

In: Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB 2010), 2010, 7

In this paper we model and implement a Constraint Programming method to refine a lattice fitting of a protein structure produced by a greedy search. We show that the model is able to provide better quality solutions. The prototype is implemented in COLA and it is based on a limited discrepancy approach. Finally, some promising extensions based on local search are discussed.

Available:
pdf (416 KB)   COLAfit-src.tar.gz (53 KB)   arXiv:1005.1853   BibTeX Entry ( MannPalu_LatFitCOLA_10 )

Freiburg RNA Tools: a web server integrating IntaRNA, ExpaRNA and LocARNA

Cameron Smith, Steffen Heyne, Andreas S. Richter, Sebastian Will, Rolf Backofen

In: Nucleic Acids Research, 2010, 38 Suppl, W373-7

The Freiburg RNA tools web server integrates three tools for the advanced analysis of RNA in a common web-based user interface. The tools IntaRNA, ExpaRNA and LocARNA support the prediction of RNA-RNA interaction, exact RNA matching and alignment of RNA, respectively. The Freiburg RNA tools web server and the software packages of the stand-alone tools are freely accessible at http://rna.informatik.uni-freiburg.de.

Available:
pdf (1502 KB)   doi:10.1093/nar/gkq316   pmid:20444875   BibTeX Entry ( Smith:Heyne:Richter:Freib_RNA_Tools:NAR2010 )

Computational prediction of sRNAs and their targets in bacteria

Rolf Backofen, Wolfgang R. Hess

In: RNA Biol, 2010, 7(1), 33-42

There is probably no major adaptive response in bacteria which does not have at least one small RNA (sRNA) as part of its regulatory network controlling gene expression. Thus, prokaryotic genomes encode dozens to hundreds of these riboregulators. Whereas the identification of putative sRNA genes during initial genome annotation is not yet common practice, their prediction can be done subsequently by various methods and with variable efficacy, frequently relying on comparative genome analysis. A large number of these sRNAs interact with their mRNA targets by antisense mechanisms. Yet, the computational identification of these targets appears to be challenging because frequently the partial and incomplete sequence complementarity is difficult to evaluate. Here we review the computational approaches for detecting bacterial sRNA genes and their targets, and discuss the current and future challenges that this exciting field of research is facing.

Available:
pdf (665 KB)   pmid:20061798   BibTeX Entry ( Backofen:Hess:Compu_predi_sRNAs:2010 )

Time and Space Efficient RNA-RNA Interaction Prediction via Sparse Folding

Raheleh Salari, Mathias Möhl, Sebastian Will, S. Cenk Sahinalp, Rolf Backofen

In: Bonnie Berger, Proc. of RECOMB 2010, Lecture Notes in Computer Science, 2010, 6044, 473-490

In the past years, a large set of new regulatory ncRNAs have been identified, but the number of experimentally verified targets is considerably low. Thus, computational target prediction methods are on high demand. Whereas all previous approaches for predicting a general joint structure have a complexity of O(n6) running time and O(n4) space, a more time and space efficient interaction prediction that is able to handle complex joint structures is necessary for genome-wide target prediction problems. In this paper we show how to reduce both the time and space complexity of the RNA-RNA interaction prediction problem as described by Alkan et al. via dynamic programming sparsification - which allows to discard large portions of DP tables without loosing optimality. Applying sparsification techniques reduces the complexity of the original algorithm from O(n6) time and O(n4) space to O(n4 ψ(n)) time and O(n2 ψ(n) + n3) space for some function ψ(n), which turns out to have small values for the range of n that we encounter in practice. Under the assumption that the polymer-zeta property holds for RNA-structures, we demonstrate that ψ(n)=O(n) on average, resulting in a linear time and space complexity improvement over the original algorithm. We evaluate our sparsified algorithm for RNA-RNA interaction prediction by total free energy minimization, based on the energy model of Chitsaz et al., on a set of known interactions. Our results confirm the significant reduction of time and space requirements in practice.

Available:
pdf (3553 KB)   doi:10.1007/978-3-642-12683-3_31   BibTeX Entry ( Salari:etal:_time_and_space_effic_rna:RECOMB2010 )

Lifting prediction to alignment of RNA pseudoknots

Mathias Möhl, Sebastian Will, Rolf Backofen

In: Journal of Computational Biology, 2010, 17(3), 429-42

Prediction and alignment of RNA pseudoknot structures are NP-hard. Nevertheless, several efficient prediction algorithms by dynamic programming have been proposed for restricted classes of pseudoknots. We present a general scheme that yields an efficient alignment algorithm for arbitrary such classes. Moreover, we show that such an alignment algorithm benefits from the class restriction in the same way as the corresponding structure prediction algorithm does. We look at six of these classes in greater detail. The time and space complexity of the alignment algorithm is increased by only a linear factor over the respective prediction algorithm. For five of the classes, no efficient alignment algorithms were known. For the sixth, most general class, we improve the previously best complexity of O(n(5)m(5)) time to O(nm(6)), where n and m denote sequence lengths. Finally, we apply our fastest algorithm with O(nm(4)) time and O(nm(2)) space to comparative de-novo pseudoknot prediction.

Available:
pdf (242 KB)   doi:10.1089/cmb.2009.0168   pmid:20377455   BibTeX Entry ( Moehl:Will:Backofen:PKalign:JCB2010 )

COMPUTATIONAL STUDIES OF NON-CODING RNAS - Session Introduction

R. Backofen, H. Chitsaz, I. Hofacker, S. C. Sahinalp, P. F. Stadler

In: Proc. of the Pacific Symposium on Biocomputing 2010 (PSB 2010), 2010, 15, 54-56

No abstract received.

Available:
pdf (537 KB)   pmid:19908357   BibTeX Entry ( Backofen:Chitsaz:Hofacker:COMPU_STUDI_NON:PSB2010 )

Fast prediction of RNA-RNA interaction

Raheleh Salari, Rolf Backofen, S. Cenk Sahinalp

In: Algorithms Mol Biol, 2010, 5, 5

ABSTRACT: BACKGROUND: Regulatory antisense RNAs are a class of ncRNAs that regulate gene expression by prohibiting the translation of an mRNA by establishing stable interactions with a target sequence. There is great demand for efficient computational methods to predict the specific interaction between an ncRNA and its target mRNA(s). There are a number of algorithms in the literature which can predict a variety of such interactions - unfortunately at a very high computational cost. Although some existing target prediction approaches are much faster, they are specialized for interactions with a single binding site. METHODS: In this paper we present a novel algorithm to accurately predict the minimum free energy structure of RNA-RNA interaction under the most general type of interactions studied in the literature. Moreover, we introduce a fast heuristic method to predict the specific (multiple) binding sites of two interacting RNAs. RESULTS: We verify the performance of our algorithms for joint structure and binding site prediction on a set of known interacting RNA pairs. Experimental results show our algorithms are highly accurate and outperform all competitive approaches.

Available:
pdf (386 KB)   doi:10.1186/1748-7188-5-5   pmid:20047661   BibTeX Entry ( Salari:Backofen:Sahinalp:Fast_predi_RNA:2010 )

Seed-based IntaRNA prediction combined with GFP-reporter system identifies mRNA targets of the small RNA Yfr1

Andreas S. Richter, Christian Schleberger, Rolf Backofen, Claudia Steglich

In: Bioinformatics, 2010, 26(1), 1-5

MOTIVATION: Prochlorococcus possesses the smallest genome of all sequenced photoautotrophs. Although the number of regulatory proteins in the genome is very small, the relative number of small regulatory RNAs is comparable with that of other bacteria. The compact genome size of Prochlorococcus offers an ideal system to search for targets of small RNAs (sRNAs) and to refine existing target prediction algorithms. RESULTS: Target predictions for the cyanobacterial sRNA Yfr1 were carried out with INTARNA in Prochlorococcus MED4. The ultraconserved Yfr1 sequence motif was defined as the putative interaction seed. To study the impact of Yfr1 on its predicted mRNA targets, a reporter system based on green fluorescent protein (GFP) was applied. We show that Yfr1 inhibits the translation of two predicted targets. We used mutation analysis to confirm that Yfr1 directly regulates its targets by an antisense interaction sequestering the ribosome binding site, and to assess the importance of interaction site accessibility.

Available:
pdf (293 KB)   doi:10.1093/bioinformatics/btp609   pmid:19850757   BibTeX Entry ( Richter:Schleberger:Backofen:Seed_INTAR_predi:2010 )

Qupe--a Rich Internet Application to take a step forward in the analysis of mass spectrometry-based quantitative proteomics experiments

S. P. Albaum, H. Neuweger, B. Franzel, S. Lange, D. Mertens, C. Trotschel, D. Wolters, J. Kalinowski, T. W. Nattkemper, A. Goesmann

In: Bioinformatics, 2009, 25(23), 3128-3134

MOTIVATION: The goal of present -omics sciences is to understand biological systems as a whole in terms of interactions of the individual cellular components. One of the main building blocks in this field of study is proteomics where tandem mass spectrometry (LC-MS/MS) in combination with isotopic labelling techniques provides a common way to obtain a direct insight into regulation at the protein level. Methods to identify and quantify the peptides contained in a sample are well established, and their output usually results in lists of identified proteins and calculated relative abundance values. The next step is to move ahead from these abstract lists and apply statistical inference methods to compare measurements, to identify genes that are significantly up- or down-regulated, or to detect clusters of proteins with similar expression profiles. RESULTS: We introduce the Rich Internet Application (RIA) Qupe providing comprehensive data management and analysis functions for LC-MS/MS experiments. Starting with the import of mass spectra data the system guides the experimenter through the process of protein identification by database search, the calculation of protein abundance ratios, and in particular, the statistical evaluation of the quantification results including multivariate analysis methods such as analysis of variance or hierarchical cluster analysis. While a data model to store these results has been developed, a well-defined programming interface facilitates the integration of novel approaches. A compute cluster is utilized to distribute computationally intensive calculations, and a web service allows to interchange information with other -omics software applications. To demonstrate that Qupe represents a step forward in quantitative proteomics analysis an application study on Corynebacterium glutamicum has been carried out. Availability and Implementation: Qupe is implemented in Java utilizing Hibernate, Echo2, R and the Spring framework. We encourage the usage of the RIA in the sense of the 'software as a service' concept, maintained on our servers and accessible at the following location: http://qupe.cebitec.uni-bielefeld.de CONTACT: stefan.albaum@cebitec.uni-bielefeld.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Available:
pdf (1816 KB)   doi:10.1093/bioinformatics/btp568   pmid:19808875   BibTeX Entry ( Albaum:Neuweger:Franzel:Qupe_Rich_Inter:2009 )

Conserved introns reveal novel transcripts in Drosophila melanogaster

Michael Hiller, Sven Findeiss, Sandro Lein, Manja Marz, Claudia Nickel, Dominic Rose, Christine Schulz, Rolf Backofen, Sonja J. Prohaska, Gunter Reuter, Peter F. Stadler

In: Genome Res, 2009, 19(7), 1289-300

Noncoding RNAs that are-like mRNAs-spliced, capped, and polyadenylated have important functions in cellular processes. The inventory of these mRNA-like noncoding RNAs (mlncRNAs), however, is incomplete even in well-studied organisms, and so far, no computational methods exist to predict such RNAs from genomic sequences only. The subclass of these transcripts that is evolutionarily conserved usually has conserved intron positions. We demonstrate here that a genome-wide comparative genomics approach searching for short conserved introns is capable of identifying conserved transcripts with a high specificity. Our approach requires neither an open reading frame nor substantial sequence or secondary structure conservation in the surrounding exons. Thus it identifies spliced transcripts in an unbiased way. After applying our approach to insect genomes, we predict 369 introns outside annotated coding transcripts, of which 131 are confirmed by expressed sequence tags (ESTs) and/or noncoding FlyBase transcripts. Of the remaining 238 novel introns, about half are associated with protein-coding genes-either extending coding or untranslated regions or likely belonging to unannotated coding genes. The remaining 129 introns belong to novel mlncRNAs that are largely unstructured. Using RT-PCR, we verified seven of 12 tested introns in novel mlncRNAs and 11 of 17 introns in novel coding genes. The expression level of all verified mlncRNA transcripts is low but varies during development, which suggests regulation. As conserved introns indicate both purifying selection on the exon-intron structure and conserved expression of the transcript in related species, the novel mlncRNAs are good candidates for functional transcripts.

Available:
pdf (1382 KB)   doi:10.1101/gr.090050.108   pmid:19458021   BibTeX Entry ( Hiller:Findeiss:Lein:Conse_intro_revea:2009 )

Accurate prediction of NAGNAG alternative splicing

Rileen Sinha, Swetlana Nikolajewa, Karol Szafranski, Michael Hiller, Niels Jahn, Klaus Huse, Matthias Platzer, Rolf Backofen

In: Nucleic Acids Research, 2009, 37(11), 3569-79

Alternative splicing (AS) involving NAGNAG tandem acceptors is an evolutionarily widespread class of AS. Recent predictions of alternative acceptor usage reported better results for acceptors separated by larger distances, than for NAGNAGs. To improve the latter, we aimed at the use of Bayesian networks (BN), and extensive experimental validation of the predictions. Using carefully constructed training and test datasets, a balanced sensitivity and specificity of >or=92% was achieved. A BN trained on the combined dataset was then used to make predictions, and 81% (38/47) of the experimentally tested predictions were verified. Using a BN learned on human data on six other genomes, we show that while the performance for the vertebrate genomes matches that achieved on human data, there is a slight drop for Drosophila and worm. Lastly, using the prediction accuracy according to experimental validation, we estimate the number of yet undiscovered alternative NAGNAGs. State of the art classifiers can produce highly accurate prediction of AS at NAGNAGs, indicating that we have identified the major features of the 'NAGNAG-splicing code' within the splice site and its immediate neighborhood. Our results suggest that the mechanism behind NAGNAG AS is simple, stochastic, and conserved among vertebrates and beyond.

Available:
pdf (673 KB)   doi:10.1093/nar/gkp220   pmid:19359358   BibTeX Entry ( Sinha:Nikolajewa:Szafranski:Accur_predi_NAGNA:NAR2009 )

Equivalence Classes of Optimal Structures in HP Protein Models Including Side Chains

Martin Mann, Rolf Backofen, Sebastian Will

In: Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009

Lattice protein models, as the Hydrophobic-Polar (HP) model, are a common abstraction to enable exhaustive studies on structure, function, or evolution of proteins. A main issue is the high number of optimal structures, resulting from the hydrophobicity-based energy function applied. We introduce an equivalence relation on protein structures that correlates to the energy function. We discuss the efficient enumeration of optimal representatives of the corresponding equivalence classes and the application of the results.

Available:
pdf (273 KB)   arXiv:0910.3848   BibTeX Entry ( Mann:Backofen:Will:_equivalence_classes:WCB09 )

Constraint-based Local Move Definitions for Lattice Protein Models Including Side Chains

Martin Mann, Mohamed Abou Hamra, Kathleen Steinhöfel, Rolf Backofen

In: Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009

The simulation of a protein's folding process is often done via stochastic local search, which requires a procedure to apply structural changes onto a given conformation. Here, we introduce a constraint-based approach to enumerate lattice protein structures according to k-local moves in arbitrary lattices. Our declarative description is much more flexible for extensions than standard operational formulations. It enables a generic calculation of k-local neighbors in backbone-only and side chain models. We exemplify the procedure using a simple hierarchical folding scheme.

Available:
pdf (295 KB)   arXiv:0910.3880   BibTeX Entry ( Mann:etal:_constraint-based_local_move:WCB09 )

Fast prediction of RNA-RNA Interaction

Raheleh Salari, Rolf Backofen, S. Cenk Sahinalp

In: Steven Salzberg, Tandy Warnow, Proc. of the 9th Workshop on Algorithms in Bioinformatics (WABI), Lecture Notes in Computer Science, 2009, 5724, 261-272

We present an accurate algorithm to predict the minimum free energy structure of RNA-RNA interaction under the most general type of interaction studied in the literature. Moreover, we introduce a fast heuristic algorithm to predict multiple binding sites of two interacting RNAs incorporating accessibility of target sites. We verify the performance of our algorithms for joint structure and binding site prediction on a set of known interacting RNA pairs. Experimental results show that our methods are highly accurate and outperform the other competitive approaches.

Available:
pdf (208 KB)   doi:10.1007/978-3-642-04241-6   BibTeX Entry ( salari09:_fast_predic_of_rna_rna_inter )

biRNA: Fast RNA-RNA Binding Sites Prediction

Hamidreza Chitsaz, Rolf Backofen, S. Cenk Sahinalp

In: Steven Salzberg, Tandy Warnow, Proc. of the 9th Workshop on Algorithms in Bioinformatics (WABI), Lecture Notes in Computer Science, 2009, 5724, 25-36

We present biRNA, a novel algorithm for prediction of binding sites between two RNAs based on minimization of binding free energy. Similar to RNAup approach, we assume the binding free energy is the sum of accessibility and the interaction free energies. Our algorithm maintains tractability and speed and also has two important advantages over previous similar approaches: 1) it is able to predict multiple simultaneous binding sites and 2) it computes a more accurate interaction free energy by considering both intramolecular and intermolecular base pairing. Moreover, biRNA can handle crossing interactions as well as hairpins interacting in a zigzag fashion. To deal with simultaneous accessibility of binding sites, our algorithm models their joint probability of being unpaired. Since computing the exact joint probability distribution is intractable, we approximate the joint probability by a polynomially representable graphical model namely a Chow-Liu tree-structured Markov Random Field. Experimental results show that biRNA outperforms RNAup and also support the accuracy of our approach. Our proposed Bayesian approximation of the Boltzmann joint probability distribution provides a powerful, novel framework that can also be utilized in other applications.

Available:
pdf (102 KB)   doi:10.1007/978-3-642-04241-6   BibTeX Entry ( chitsaz09:_birna )

Protein Folding Simulation by Two-Stage Optimization

Abu Dayem Ullah, Leonidas Kapsokalivas, Martin Mann, Kathleen Steinhöfel

In: Proc. of ISICA'09, CCIS, Oct 2009, 51, 138-145

We propose a two-stage optimization approach for protein folding simulation in the FCC lattice, inspired from the phenomenon of hydrophobic collapse. Given a protein sequence, the first stage of the approach produces compact protein structures with the maximal number of contacts among hydrophobic monomers, using the CPSP tools for optimal structure prediction in the HP model. The second stage uses those compact structures as starting points to further optimize the protein structure for the input sequence by employing simulated annealing local search and a 20 amino acid pairwise interactions energy function. Experimental results with PDB sequences show that compact structures produced by the CPSP tools are up to two orders of magnitude better, in terms of the pairwise energy function, than randomly generated ones. Also, initializing simulated annealing with these compact structures, produces better structures in fewer iterations than initializing with random structures. Hence, the proposed two-stage optimization outperforms a local search procedure based on simulated annealing alone.

Available:
pdf (151 KB)   doi:10.1007/978-3-642-04962-0_16   BibTeX Entry ( Ullah:CPSP_LS:09 )

Fast RNA Structure Alignment for Crossing Input Structures

Rolf Backofen, Gad M. Landau, Mathias Möhl, Dekel Tsur, Oren Weimann

In: Gregory Kucherov, Esko Ukkonen, Proc. 20th Symp. Combinatorial Pattern Matching, LNCS, 2009, 5577, 236-248

The complexity of pairwise RNA structure alignment depends on the structural restrictions assumed for both the input structures and the computed consensus structure. For arbitrarily crossing input and consensus structures, the problem is NP-hard. For non-crossing consensus structures, Jiang et al's algorithm [1] computes the alignment in O(n^2 m^2 ) time where n and m denote the lengths of the two input sequences. If also the input structures are non-crossing, the problem n corresponds to tree editing which can be solved in O(m^2 n(1 + log n/m )) time [2]. We present a new algorithm that solves the problem for dcrossing structures in O(dm^2 n log n) time, where d is a parameter that is one for non-crossing structures, bounded by n for crossing structures, and much smaller than n on most practical examples. Crossing input structures allow for applications where the input is not a fixed structure but is given as base-pair probability matrices. Keywords : RNA, sequence structure alignment, simultaneous alignment and folding

Available:
pdf (570 KB)   doi:10.1007/978-3-642-02441-2_21   BibTeX Entry ( backofen09:_fast_rna_struc_align_for )

Sparse RNA Folding: Time and Space Efficient Algorithms

Rolf Backofen, Dekel Tsur, Shay Zakov, Michal Ziv-Ukelson

In: Gregory Kucherov, Esko Ukkonen, Proc. 20th Symp. Combinatorial Pattern Matching, LNCS, 2009, 5577, 249-262

The classical algorithm for RNA single strand folding requires O(nZ) time and O(n2 ) space, where n denotes the length of the input sequence and Z is a sparsity parameter that satisfies n <= Z <= n^2 . We show how to reduce the space complexity of this algorithm. The space reduction is based on the observation that some solutions for subproblems are not examined after a certain stage of the algorithm, and may be discarded from memory. This yields an O(nZ) time and O(Z) space algorithm, that outputs both the cardinality of the optimal folding as well as a corresponding secondary structure. The space-efficient approach also extends to the related RNA simultaneous alignment with folding problem, and can be applied to reduce the space complexity of the fastest algorithm for this problem from O(n^2 m^2 ) down to O(nm^2 + Z), where n and m denote the lengths of the input sequences to be aligned, and Z is a sparsity parameter that satisfies nm <= Z <= n^2 m^2 . In addition, we also show how to speed up the base-pairing maximization variant of RNA single strand folding. The speed up is achieved by combining two independent existing techniques, which restrict the number of expressions that need to be examined in bottleneck computations of these algorithms. This yields an O(LZ) time and O(Z) space algorithm, where L denotes the maximum cardinality of a folding of the input sequence.

Available:
pdf (216 KB)   doi:10.1007/978-3-642-02441-2_22   BibTeX Entry ( backofen09:_spars_rna_foldin )

A partition function algorithm for interacting nucleic acid strands

Hamidreza Chitsaz, Raheleh Salari, S. Cenk Sahinalp, Rolf Backofen

In: Bioinformatics, 2009, 25(12), i365-73

Recent interests, such as RNA interference and antisense RNA regulation, strongly motivate the problem of predicting whether two nucleic acid strands interact. MOTIVATION: Regulatory non-coding RNAs (ncRNAs) such as microRNAs play an important role in gene regulation. Studies on both prokaryotic and eukaryotic cells show that such ncRNAs usually bind to their target mRNA to regulate the translation of corresponding genes. The specificity of these interactions depends on the stability of intermolecular and intramolecular base pairing. While methods like deep sequencing allow to discover an ever increasing set of ncRNAs, there are no high-throughput methods available to detect their associated targets. Hence, there is an increasing need for precise computational target prediction. In order to predict base-pairing probability of any two bases in interacting nucleic acids, it is necessary to compute the interaction partition function over the whole ensemble. The partition function is a scalar value from which various thermodynamic quantities can be derived. For example, the equilibrium concentration of each complex nucleic acid species and also the melting temperature of interacting nucleic acids can be calculated based on the partition function of the complex. RESULTS: We present a model for analyzing the thermodynamics of two interacting nucleic acid strands considering the most general type of interactions studied in the literature. We also present a corresponding dynamic programming algorithm that computes the partition function over (almost) all physically possible joint secondary structures formed by two interacting nucleic acids in O(n(6)) time. We verify the predictive power of our algorithm by computing (i) the melting temperature for interacting RNA pairs studied in the literature and (ii) the equilibrium concentration for several variants of the OxyS-fhlA complex. In both experiments, our algorithm shows high accuracy and outperforms competitors. AVAILABILITY: Software and web server is available at http://compbio.cs.sfu.ca/taverna/pirna/. SUPPLEMENTARY INFORMATION: Supplementary data are avaliable at Bioinformatics online.

Available:
pdf (415 KB)   supplementary.pdf (193 KB)   doi:10.1093/bioinformatics/btp212   pmid:19478011   BibTeX Entry ( Chitsaz:Salari:Sahinalp:parti_funct_algor:2009 )

Lightweight Comparison of RNAs Based on Exact Sequence-Structure Matches

Steffen Heyne, Sebastian Will, Michael Beckstette, Rolf Backofen

In: Bioinformatics, 2009, 25(16), 2095-2102

MOTIVATION: Specific functions of ribonucleic acid (RNA) molecules are often associated with different motifs in the RNA structure. The key feature that forms such an RNA motif is the combination of sequence and structure properties. In this article, we introduce a new RNA sequence-structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs. Based on a fast detectable set of overlapping and crossing substructure matches for two nested RNA secondary structures, our method ExpaRNA (exact pattern of alignment of RNA) computes the longest collinear sequence of substructures common to two RNAs in O(H*nm) time and O(nm) space, where H << n.m for real RNA structures. Applied to different RNAs, our method correctly identifies sequence-structure similarities between two RNAs. RESULTS: We have compared ExpaRNA with two other alignment methods that work with given RNA structures, namely RNAforester and RNA_align. The results are in good agreement, but can be obtained in a fraction of running time, in particular for larger RNAs. We have also used ExpaRNA to speed up state-of-the-art Sankoff-style alignment tools like LocARNA, and observe a tradeoff between quality and speed. However, we get a speedup of 4.25 even in the highest quality setting, where the quality of the produced alignment is comparable to that of LocARNA alone. AVAILABILITY: The presented algorithm is implemented in the program ExpaRNA, which is available from our website (http://www.bioinf.uni-freiburg.de/Software).

Available:
supplementary_data.pdf (632 KB)   pdf (425 KB)   doi:10.1093/bioinformatics/btp065   pmid:19189979   BibTeX Entry ( Heyne:Will:Beckstette:Backofen:BI_ExpaRNA_2009 )

CPSP-web-tools: a server for 3D lattice protein studies

Martin Mann, Cameron Smith, Mohamad Rabbath, Marlien Edwards, Sebastian Will, Rolf Backofen

In: Bioinformatics, 2009, 25(5), 676-7

Studies on proteins are often restricted to highly simplified models to face the immense computational complexity of the associated problems. Constraint-based protein structure prediction (CPSP) tools is a package of very fast algorithms for ab initio optimal structure prediction and related problems in 3D HP-models [cubic and face centered cubic (FCC)]. Here, we present CPSP-web-tools, an interactive online interface of these programs for their immediate use. They include the first method for the direct prediction of optimal energies and structures in 3D HP side-chain models. This newest extension of the CPSP approach is described here for the first time. AVAILABILITY AND IMPLEMENTATION: Free access at http://cpsp.informatik.uni-freiburg.de

Available:
pdf (83 KB)   doi:10.1093/bioinformatics/btp034   pmid:19151096   BibTeX Entry ( Mann_CPSPweb_2009 )

Simultaneous Alignment and Folding of Protein Sequences

Jérôme Waldispühl, Charles W. O'Donnell, Sebastian Will, Srinivas Devadas, Rolf Backofen, Bonnie Berger

In: Serafim Batzoglou, Proc.of the 13th Annual International Conferences on Computational Molecular Biology (RECOMB'09), LNBI, 2009, 5541, 339--355

One of the central challenges in computational biology is to develop accurate tools for protein structure analysis. Particularly difficult cases of this are sequence alignment and consensus folding of low-homology proteins. In this work, we present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align additionally exploits sparsity in the set of likely super-secondary structure pairings and alignment candidates for each amino acid to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane beta-barrel proteins, an important yet difficult class of proteins with very few available three-dimensional structures. In tests on sequence alignments derived from structure alignments, partiFold-Align is significantly more accurate than current best approaches for pairwise sequence alignment in the difficult case of low sequence homology and improves secondary structure prediction when current approaches fail. Importantly, partiFold-Align does not require training on transmembrane beta-barrel proteins. The generality of these techniques should allow them to be applied to a wide variety of protein structures.

Available:
pdf (337 KB)   BibTeX Entry ( Waldispuehl:etal:TMBalign:RECOMB09 )

Lifting Prediction to Alignment of RNA Pseudoknots

Mathias Möhl, Sebastian Will, Rolf Backofen

In: Serafim Batzoglou, Proc.of the 13th Annual International Conferences on Computational Molecular Biology (RECOMB'09), LNBI, 2009, 5541, 285--301

Many computational problems concerning RNA pseudoknot structures, most prominently their prediction and alignment, are NP-hard. For structure prediction, several algorithms have been proposed that are restricted to certain classes of pseudoknots in order to run efficiently. We present a general scheme that yields an efficient alignment algorithm for arbitrary classes of pseudoknots that can be predicted efficiently by dynamic programming. Moreover, we show that such an alignment algorithm benefits from restrictions to certain structure classes in the same way as structure prediction algorithms do. We look at five of these classes in greater detail. Compared to the respective structure prediction algorithm, the time and space complexity of the obtained alignment algorithm is increased by only a linear factor. For four of the classes, there do not exist alignment algorithms so far. For the fifth, most general class, our algorithm improves the time complexity compared to the best algorithm known so far, from O(n5m5) time to O(nm6), where n and m are the length of the two aligned sequences. Finally, we apply the fastest of the generated algorithms with complexity O(nm4) time and O(nm2) space to comparative de-novo prediction of pseudoknots.

Available:
Proofs.pdf (78 KB)   pdf (290 KB)   BibTeX Entry ( Moehl:Will:Backofen:PKalign:RECOMB2009 )

Fast Feature Subset Selection in Biological Sequence Analysis

Rainer Pudimat, Rolf Backofen, Ernst-Günter Schukat-Talamazzini

In: International Journal of Pattern Recognition and Artificial Intelligence, 2009, 23(2), 191 -- 207

Motivation:Biological research produces a wealth of measured data. Neither it is easy for biologists to postulate hypotheses about the behaviour or structure of the observed entity because the relevant properties measured are not seen in the ocean of measurements. Nor it is easy to design machine learning algorithms to classify or cluster the data items for the same reason. Algorithms for automatically selecting a highly predictive subset of the measured features can help to overcome these difficulties.Results: We present an efficient feature selection strategy which can be applied to arbitrary feature selection problems. The core technique is a new method for estimating the quality of subsets from previously calculated qualities for smaller subsets by minimising the mean standard error of estimated values with an approach common to support vector machines. This method can be integrated in many feature subset search algorithms. We have applied it with sequential search algorithms and have been able to reduce the number of quality calculations for finding accurate feature subsets by about 70\%. We show these improvements by applying our approach to the problem of finding highly predictive feature subsets for transcription factor binding sites.

Available:
pdf (276 KB)   BibTeX Entry ( pudimat:schukat:2008:ijprai )

Improved identification of conserved cassette exons using Bayesian networks

R. Sinha, M. Hiller, R. Pudimat, U. Gausmann, M. Platzer, R. Backofen

In: BMC Bioinformatics, 2008, 9(1), 477

ABSTRACT: BACKGROUND: Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence in silico methods of predicting alternative splicing have to be improved. RESULTS: Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%. CONCLUSION: BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem.

Available:
pdf (530 KB)   doi:10.1186/1471-2105-9-477   pmid:19014490   BibTeX Entry ( Sinha:Hiller:Pudimat:Impro_ident_conse:2008 )

RNAalifold: improved consensus structure prediction for RNA alignments

Stephan H. Bernhart, Ivo L. Hofacker, Sebastian Will, Andreas R. Gruber, Peter F. Stadler

In: BMC Bioinformatics, 2008, 9, 474

Background: The prediction of a consensus structure for a set of related RNAs is an important first step for subsequent analyses. RNAalifold, which computes the minimum energy structure that is simultaneously formed by a set of aligned sequences, is one of the oldest and most widely used tools for this task. In recent years, several alternative approaches have been advocated, pointing to several shortcomings of the original RNAalifold approach. Results: We show that the accuracy of RNAalifold predictions can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. These improvements are achieved without compromising the computational efficiency of the algorithm. We show here that the new version of RNAalifold not only outperforms the old one, but also several other tools recently developed, on different datasets. Conclusions: The new version of RNAalifold not only can replace the old one for almost any application but it is also competitive with other approaches including those based on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor classifiers.

Available:
pdf (398 KB)   doi:10.1186/1471-2105-9-474   pmid:19014431   BibTeX Entry ( Bernhart:RNAalifold:BMC2008 )

IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions

Anke Busch, Andreas S. Richter, Rolf Backofen

In: Bioinformatics, 2008, 24(24), 2849-56

MOTIVATION: During the last few years, several new small regulatory RNAs (sRNAs) have been discovered in bacteria. Most of them act as post-transcriptional regulators by base pairing to a target mRNA, causing translational repression or activation, or mRNA degradation. Numerous sRNAs have already been identified, but the number of experimentally verified targets is considerably lower. Consequently, computational target prediction is in great demand. Many existing target prediction programs neglect the accessibility of target sites and the existence of a seed, while other approaches are either specialized to certain types of RNAs or too slow for genome-wide searches. RESULTS: We introduce INTARNA, a new general and fast approach to the prediction of RNA-RNA interactions incorporating accessibility of target sites as well as the existence of a user-definable seed. We successfully applied INTARNA to the prediction of bacterial sRNA targets and determined the exact locations of the interactions with a higher accuracy than competing programs. AVAILABILITY: http://www.bioinf.uni-freiburg.de/Software/

Available:
pdf (189 KB)   supplement.pdf (69 KB)   doi:10.1093/bioinformatics/btn544   pmid:18940824   BibTeX Entry ( Busch:Richter:Backofen:INTAR_effic_predi:2008 )

Widespread and subtle: alternative splicing at short-distance tandem sites

Michael Hiller, Matthias Platzer

In: Trends in Genetics, 2008, 24(5), 246-55

Alternative splicing at donor or acceptor sites located just a few nucleotides apart is widespread in many species. It results in subtle changes in the transcripts and often in the encoded proteins. Several of these tandem splice events contribute to the repertoire of functionally different proteins, whereas many are neutral or deleterious. Remarkably, some of the functional events are differentially spliced in tissues or developmental stages, whereas others exhibit constant splicing ratios, indicating that function is not always associated with differential splicing. Stochastic splice site selection seems to play a major role in these processes. Here, we review recent progress in understanding functional and evolutionary aspects as well as the mechanism of splicing at short-distance tandem sites.

Available:
pdf (859 KB)   doi:10.1016/j.tig.2008.03.003   pmid:18394746   BibTeX Entry ( Hiller:Platzer:Wides_and_subtl:TIG2008 )

Genetic Variants of the Copy Number Polymorphic beta-Defensin Locus Are Associated with Sporadic Prostate Cancer

K. Huse, S. Taudien, M. Groth, P. Rosenstiel, K. Szafranski, M. Hiller, J. Hampe, K. Junker, J. Schubert, S. Schreiber, G. Birkenmeier, M. Krawczak, M. Platzer

In: Tumour Biol, 2008, 29(2), 83-92

Background/Aims: Prostate cancer represents the cancer with the highest worldwide prevalence in men. Chromosome 8p23 has shown suggestive genetic linkage to early-onset familial prostate cancer and is frequently deleted in cancer cells of the urogenital tract. Within this locus some beta-defensin genes (among them DEFB4, DEFB103, DEFB104) are localized, which are arranged in a gene cluster shown to exhibit an extensive copy number variation in the population. This structural variation considerably hampers genetic studies. In a new approach considering both sequence as well as copy number variations we aimed to compare the defensin locus at 8p23 in prostate cancer patients and controls. Methods: We apply PCR/cloning-based haplotyping and high-throughput copy number determination methods which allow assessment of both individual haplotypes and gene copy numbers not accessible to conventional SNP-based genotyping. Results: We demonstrate association of four common DEFB104 haplotypes with the risk of prostate cancer in two independent patient cohorts. Moreover, we show that high copy numbers (>9) of the defensin gene cluster are significantly underrepresented in both patient samples. Conclusions: Our findings imply a role of the antibacterial defensins in prostate cancerogenesis qualifying distinct gene variants and copy numbers as potential tumor markers.

Available:
pdf (412 KB)   doi:10.1159/000135688   pmid:18515986   BibTeX Entry ( Huse:Taudien:Groth:Genet_Varia_the:2008 )

Alternative splicing at NAGNAG acceptors in Arabidopsis thaliana SR and SR-related protein-coding genes

Stefanie Schindler, Karol Szafranski, Michael Hiller, Gul Shad Ali, Saiprasad G. Palusa, Rolf Backofen, Matthias Platzer, Anireddy S. N. Reddy

In: BMC Genomics, 2008, 9, 159

BACKGROUND: Several recent studies indicate that alternative splicing in Arabidopsis and other plants is a common mechanism for post-transcriptional modulation of gene expression. However, few analyses have been done so far to elucidate the functional relevance of alternative splicing in higher plants. Representing a frequent and universal subtle alternative splicing event among eukaryotes, alternative splicing at NAGNAG acceptors contributes to transcriptome diversity and therefore, proteome plasticity. Alternatively spliced NAGNAG acceptors are overrepresented in genes coding for proteins with RNA-recognition motifs (RRMs). As SR proteins, a family of RRM-containing important splicing factors, are known to be extensively alternatively spliced in Arabidopsis, we analyzed alternative splicing at NAGNAG acceptors in SR and SR-related genes. RESULTS: In a comprehensive analysis of the Arabidopsis thaliana genome, we identified 6,772 introns that exhibit a NAGNAG acceptor motif. Alternative splicing at these acceptors was assessed using available EST data, complemented by a sequence-based prediction method. Of the 36 identified introns within 30 SR and SR-related protein-coding genes that have a NAGNAG acceptor, we selected 15 candidates for an experimental analysis of alternative splicing under several conditions. We provide experimental evidence for 8 of these candidates being alternatively spliced. Quantifying the ratio of NAGNAG-derived splice variants under several conditions, we found organ-specific splicing ratios in adult plants and changes in seedlings of different ages. Splicing ratio changes were observed in response to heat shock and most strikingly, cold shock. Interestingly, the patterns of differential splicing ratios are similar for all analyzed genes. CONCLUSION: NAGNAG acceptors frequently occur in the Arabidopsis genome and are particularly prevalent in SR and SR-related protein-coding genes. A lack of extensive EST coverage can be compensated by using the proposed sequence-based method to predict alternative splicing at these acceptors. Our findings indicate that the differential effects on NAGNAG alternative splicing in SR and SR-related genes are organ- and condition-specific rather than gene-specific.

Available:
pdf (510 KB)   doi:10.1186/1471-2164-9-159   pmid:18402682   BibTeX Entry ( Schindler:Szafranski:Hiller:Alter_splic_NAGNA:2008 )

Selection against tandem splice sites affecting structured protein regions

Michael Hiller, Karol Szafranski, Klaus Huse, Rolf Backofen, Matthias Platzer

In: BMC Evolutionary Biology, 2008, 8, 89

BACKGROUND: Alternative selection of splice sites in tandem donors and acceptors is a major mode of alternative splicing. Here, we analyzed whether in-frame tandem sites leading to subtle mRNA insertions/deletions of 3, 6, or 9 nucleotides are under natural selection. RESULTS: We found multiple lines of evidence that the human protein coding sequences are under selection against such in-frame tandem splice events, indicating that these events are often deleterious. The strength of selection is not homogeneous within the coding sequence as protein regions that fold into a fixed 3D structure (intrinsically ordered) are under stronger selection, especially against sites with a strong minor splice site. Investigating structures of functional protein domains, we found that tandem acceptors are preferentially located at the domain surface and outside structural elements such as helices and sheets. Using three-species comparisons, we estimate that more than half of all mutations that create NAGNAG acceptors in the coding region have been eliminated by selection. CONCLUSION: We estimate that  2,400 introns are under selection against possessing a tandem site.

Available:
pdf (275 KB)   BibTeX Entry ( Hiller2008Selection )

Assessing the fraction of short-distance tandem splice sites under purifying selection

Michael Hiller, Karol Szafranski, Rileen Sinha, Klaus Huse, Swetlana Nikolajewa, Philip Rosenstiel, Stefan Schreiber, Rolf Backofen, Matthias Platzer

In: RNA, 2008, 14, 616-29

Many alternative splice events result in subtle mRNA changes, and most of them occur at short-distance tandem donor and acceptor sites. The splicing mechanism of such tandem sites likely involves the stochastic selection of either splice site. While tandem splice events are frequent, it is unknown how many are functionally important. Here, we use phylogenetic conservation to address this question, focusing on tandems with a distance of 3-9 nucleotides. We show that previous contradicting results on whether alternative or constitutive tandem motifs are more conserved between species can be explained by a statistical paradox (Simpson's paradox). Applying methods that take biases into account, we found higher conservation of alternative tandems in mouse, dog, and even chicken, zebrafish, and Fugu genomes. We estimated a lower bound for the number of alternative sites that are under purifying (negative) selection. While the absolute number of conserved tandem motifs decreases with the evolutionary distance, the fraction under selection increases. Interestingly, a number of frameshifting tandems are under selection, suggesting a role in regulating mRNA and protein levels via nonsense-mediated decay (NMD). An analysis of the intronic flanks shows that purifying selection also acts on the intronic sequence. We propose that stochastic splice site selection can be an advantageous mechanism that allows constant splice variant ratios in situations where a deviation in this ratio is deleterious.

Available:
pdf (940 KB)   BibTeX Entry ( Hiller2008Assessing )

Fixed Parameter Tractable Alignment of RNA Structures Including Arbitrary Pseudoknots

Mathias Möhl, Sebastian Will, Rolf Backofen

In: Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM 2008), LNCS, 2008, 69-81

We present an algorithm that computes the edit distance of two RNA structures with arbitrary kinds of pseudoknots. A main benefit of the algorithm is that, despite the problem is NP-hard, the algorithmic complexity adapts to the complexity of the RNA structures. Due to fixed parameter tractability, we can guarantee polynomial run-time for a parameter which is small in practice. Our algorithm can be considered as a generalization of the algorithm of Jiang et al. (Jiang, 2002) to arbitrary pseudoknots. In their absence, it gracefully degrades to the same polynomial algorithm. A prototypical implementation demonstrates the applicability of the method.

Available:
pdf (257 KB)   BibTeX Entry ( Moehl:Will:Backofen:CPM2008 )

Efficient Sequence Alignment with Side-Constraints by Cluster Tree Elimination

Sebastian Will, Anke Busch, Rolf Backofen

In: Constraints Journal, 2008, 13(1), 110-129

Aligning DNA and protein sequences is a core technique in molecular biology. Often, it is desirable to include partial prior knowledge and conditions in an alignment. Going beyond prior work, we aim at the integration of such side constraints in free combination into alignment algorithms. The most common and successful technique for efficient alignment algorithms is dynamic programming (DP). However, a weakness of DP is that one cannot include additional constraints without specifically tailoring a new DP algorithm. Here, we discuss a declarative approach that is based on constraint techniques and show how it can be extended by formulating additional knowledge as constraints. We take special care to obtain the efficiency of DP for sequence alignment. This is achieved by careful modeling and applying proper solving strategies. Finally, we apply our method to the scanning for RNA motifs in large sequences. This case study demonstrates how the new approach can be used in real biological problems. A prototypic implementation of the method is available at http://www.bioinf.uni-freiburg.de/Software/CTE-Alignment.

Available:
pdf (397 KB)   doi:10.1007/s10601-007-9032-x   BibTeX Entry ( Will:Busch:Backofen:_effic_sequen_align_side_const:Constraints2008 )

Structure Local Multiple Alignment of RNA

Wolfgang Otto, Sebastian Will, Rolf Backofen

In: Proceedings of German Conference on Bioinformatics (GCB'2008), Lecture Notes in Informatics (LNI), 2008, P-136, 178-188

Today, RNA is well known to perform important regulatory and catalytical function due to its distinguished structure. Consequentially, structure is conserved in evolution and state-of-the art RNA multiple alignment algorithms consider structure as well as sequence information. However, existing tools neglect the important aspect of locality. Notably, locality in RNA occurs as similarity of subsequences as well as similarity of only substructures.

We present a novel approach for multiple alignment of RNAs that deals with both kinds of locality. The approach extends LocARNA by structural locality and delegates the construction of multiple alignments to T-Coffee.

The paper systematically investigates structural locality in known RNA families. Benchmarking multiple alignment tools on structurally local families shows the need for algorithmic support of this locality. The improvement in accuracy in special cases is achieved while staying competitive with state-of-the-art alignment tools across the whole Bralibase.

LocARNA and its T-Coffee extended variant LocARNATE are freely available at http://www.bioinf.uni-freiburg.de/Software/LocARNA/.

Available:
pdf (164 KB)   BibTeX Entry ( Otto:Will:Backofen:_struc_local_multip_align_of_rna:CGB08 )

Lightweight comparison of RNAs based on exact sequence-structure matches

Steffen Heyne, Sebastian Will, Michael Beckstette, Rolf Backofen

In: Proceedings of the German Conference on Bioinformatics (GCB'2008), Lecture Notes in Informatics (LNI), 2008, P-136, 189-198

Specific functions of RNA molecules are often associated to different motifs in the RNA structure. The key feature is that the combination of sequence and structure properties form such an RNA motif. In this paper we introduce a new RNA sequence-structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs. Based on a fast detectable set of overlapping and crossing substructure matches for two nested RNA secondary structures, our method computes the longest colinear sequence of substructures common to two RNAs in O(n2m2) time and O(nm) space. Applied to different RNAs, our method correctly identifies sequence-structure similarities between two RNAs. The results of our experiments are in good agreement with existing alignment-based methods, but can be obtained in a fraction of running time, in particular for larger RNAs. The proposed algorithm is implemented in the program expaRNA, which is available from our website (www.bioinf.uni-freiburg.de/Software).

Available:
pdf (1263 KB)   BibTeX Entry ( Heyne:Will:Beckstette:Backofen:GCB08 )

CPSP-tools - Exact and Complete Algorithms for High-throughput 3D Lattice Protein Studies

Martin Mann, Sebastian Will, Rolf Backofen

In: BMC Bioinformatics, 2008, 9, 230

Background: The principles of protein folding and evolution pose problems of very high inherent complexity. Often these problems are tackled using simplified protein models, e.g. lattice proteins. The CPSP-tools package provides programs to solve exactly and completely the problems typical of studies using 3D lattice protein models. Among the tasks addressed are the prediction of (all) globally optimal and/or suboptimal structures as well as sequence design and neutral network exploration. Results: In contrast to stochastic approaches, which are not capable of answering many fundamental questions, our methods are based on fast, non-heuristic techniques. The resulting tools are designed for high-throughput studies of 3D-lattice proteins utilizing the Hydrophobic-Polar (HP) model. The source bundle is freely available at http://www.bioinf.uni-freiburg.de/sw/cpsp/ Conclusions: The CPSP-tools package is the first set of exact and complete methods for extensive, high-throughput studies of non-restricted 3D-lattice protein models. In particular, our package deals with cubic and face centered cubic (FCC) lattices.

Available:
pdf (438 KB)   doi:10.1186/1471-2105-9-230   BibTeX Entry ( Mann:Will:Backofen:CPSP-tools:BMCB:2008 )

Variations on RNA folding and alignment: lessons from Benasque

Athanasius F. Bompfünewerer, Rolf Backofen, Stephan H. Bernhart, Jana Hertel, Ivo L. Hofacker, Peter F. Stadler, Sebastian Will

In: Journal of Mathematical Biology, 2008, 56(1-2), 129-144

Dynamic programming algorithms solve many standard problems of RNA bioinformatics in polynomial time. In this contribution we discuss a series of variations on these standard methods that implement refined biophysical models, such as a restriction of RNA folding to canonical structures, and an extension of structural alignments to an explicit scoring of stacking propensities. Furthermore, we demonstrate that a local structural alignment can be employed for ncRNA gene finding. In this context we discuss scanning variants for folding and alignment algorithms.

Available:
pdf (494 KB)   doi:10.1007/s00285-007-0107-5   pmid:17611759   BibTeX Entry ( Bompfuenewerer:_variat_rna_foldin_align:JMathB07 )

Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments

Stefan E. Seemann, Jan Gorodkin, Rolf Backofen

In: Nucleic Acids Research, 2008, 36(20), 6355-62

Computational methods for determining the secondary structure of RNA sequences from given alignments are currently either based on thermodynamic folding, compensatory base pair substitutions or both. However, there is currently no approach that combines both sources of information in a single optimization problem. Here, we present a model that formally integrates both the energy-based and evolution-based approaches to predict the folding of multiple aligned RNA sequences. We have implemented an extended version of Pfold that identifies base pairs that have high probabilities of being conserved and of being energetically favorable. The consensus structure is predicted using a maximum expected accuracy scoring scheme to smoothen the effect of incorrectly predicted base pairs. Parameter tuning revealed that the probability of base pairing has a higher impact on the RNA structure prediction than the corresponding probability of being single stranded. Furthermore, we found that structurally conserved RNA motifs are mostly supported by folding energies. Other problems (e.g. RNA-folding kinetics) may also benefit from employing the principles of the model we introduce. Our implementation, PETfold, was tested on a set of 46 well-curated Rfam families and its performance compared favorably to that of Pfold and RNAalifold.

Available:
pdf (471 KB)   doi:10.1093/nar/gkn544   pmid:18836192   BibTeX Entry ( Seemann:Gorodkin:Backofen:Unify_evolu_and:NAR2008 )

Classifying protein-like sequences in arbitrary lattice protein models using LatPack

Martin Mann, Daniel Maticzka, Rhodri Saunders, Rolf Backofen

In: HFSP Journal, 2008, 2(6), 396-404

Knowledge of a protein's 3-dimensional native structure is vital in determining its chemical properties and functionality. However, experimental methods to determine structure are very costly and time-consuming. Computational approaches, such as folding simulations and structure prediction algorithms, are quicker and cheaper but lack consistent accuracy. This currently restricts extensive computational studies to abstract protein models. It is thus essential that simplifications induced by the models do not negate scientific value. Key to this is the use of thoroughly defined protein-like sequences. In such cases abstract models can allow for the investigation of important biological questions. Here we present a procedure to generate and classify protein-like sequence data sets. Our LatPack tools, and the approach in general, are applicable to arbitrary lattice protein models. Identification is based on thermodynamic and kinetic features. Further LatPack can incorporate the sequential assembly of proteins by addressing co-translational folding. We demonstrate the approach in the widely used, unrestricted 3D-cubic HP-model. The resulting sequence set is the first large data set for this model exhibiting the protein-like properties required. Our data and tools are freely available and can be used to investigate protein-related problems. Furthermore our data sets can serve as the first benchmark sequence sets for folding algorithms that have traditionally only been tested on random sequences.

Publication note:
Special issue on protein folding: experimental and theoretical approaches

Available:
pdf (456 KB)   data-used.txt (0 KB)   doi:10.2976/1.3027681   BibTeX Entry ( Mann_LatPack_HFSP_08 )

IMS2 -- An integrated medical software system for early lung cancer detection using ion mobility spectrometry data of human breath

Jan Baumbach, Alexander Bunkowski, Sita Lange, Timm Oberwahrenbrock, Nils Kleinbölting, Sven Rahmann, Jörg Ingo Baumbach

In: Journal of Integrative Bioinformatics, 2007, 4(3), 75

IMS2 is an Integrated Medical Software system for the analysis of Ion Mobility Spectrometry (IMS) data. It assists medical staff with the following IMS data processing steps: acquisition, visualization, classification, and annotation. IMS2 provides data analysis and interpretation features on the one hand, and also helps to improve the classification by increasing the number of the pre-classified datasets on the other hand. It is designed to facilitate early detection of lung cancer, one of the most common cancer types with one million deaths each year around the world. After reviewing the IMS technology, we first describe the software architecture of IMS2 and then the integrated classification module, including necessary pre-processing steps and different classification methods. The Lung Hospital Hemer (Germany) provided IMS data of 35 patients suffering from lung cancer and 72 samples of healthy persons. IMS2 correctly classifies 99% of the samples, evaluated using 10-fold cross-validation.

Available:
pdf (669 KB)   doi:doi:10.2390/biecoll-jib-2007-75   BibTeX Entry ( Baumbach:Bunkowski:IMS2:2007 )

Inferring Non-Coding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering

Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, Rolf Backofen

In: PLoS Comput Biol, 2007, 3(4), e65

The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficientto establish homology. In some cases, such as microRNAs, box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist ofmultiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative non-coding RNAs and regulatory RNA signals. For many ofthem, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypo-thetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classesfrom genome-wide surveys for structured RNAs. The LocARNA tool implements a novel variant of theSankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured ornon-conserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seedalignments. Furthermore, we have applied it to a previously published set of 3332 predicted structured elements in the Ciona intestinalis genomes (Missal et al., Bioinformatics 21(S2), i77-i78). In addition torecovering e.g. tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to-date norepresentative has been experimentally characterized.

Available:
pdf (4220 KB)   doi:10.1371/journal.pcbi.0030065   pmid:17432929   BibTeX Entry ( Will:etal:_infer_non_codin_rna_famil:PLOS2007 )

Locality and gaps in RNA comparison

Rolf Backofen, Shihyen Chen, Danny Hermelin, Gad M. Landau, Mikhail A. Roytberg, Oren Weimann, Kaizhong Zhang

In: Journal of Computational Biology, 2007, 14(8), 1074-87

Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, but also structural features of the inspected molecule. This makes the computation more challenging, and usually prohibits the comparison only to small RNAs. In this paper we introduce two local metrics for comparing RNAs that extend the Smith-Waterman metric and its normalized version used for string comparison. We also present a global RNA alignment algorithm which handles affine gap penalties. Our global algorithm runs in O(m(2)n(1 + lg n/m)) time, while our local algorithms run in O(m(2)n(1 + lg n/m)) and O(n(2)m) time, respectively, where m

Available:
pdf (258 KB)   doi:10.1089/cmb.2007.0062   pmid:17985988   BibTeX Entry ( Backofen:Chen:Hermelin:Local_and_gaps:JCB2007 )

INFO-RNA--a server for fast inverse RNA folding satisfying sequence constraints

Anke Busch, Rolf Backofen

In: Nucleic Acids Research, 2007, 35(Web Server issue), W310-3

INFO-RNA is a new web server for designing RNA sequences that fold into a user given secondary structure. Furthermore, constraints on the sequence can be specified, e.g. one can restrict sequence positions to a fixed nucleotide or to a set of nucleotides. Moreover, the user can allow violations of the constraints at some positions, which can be advantageous in complicated cases. The INFO-RNA web server allows biologists to design RNA sequences in an automatic manner. It is clearly and intuitively arranged and easy to use. The procedure is fast, as most applications are completed within seconds and it proceeds better and faster than other existing tools. The INFO-RNA web server is freely available at http://www.bioinf.uni-freiburg.de/Software/INFO-RNA/

Available:
pdf (91 KB)   doi:10.1093/nar/gkm218   pmid:17452349   BibTeX Entry ( Busch:Backofen:INFO_serve_for:NAR2007 )

Violating the splicing rules: TG dinucleotides function as alternative 3' splice sites in U2-dependent introns

K. Szafranski, S. Schindler, S. Taudien, M. Hiller, K. Huse, N. Jahn, S. Schreiber, R. Backofen, M. Platzer

In: Genome Biol, 2007, 8(8), R154

ABSTRACT: BACKGROUND: Despite some degeneracy of sequence signals that govern splicing of eukaryotic pre-mRNAs, it is an accepted rule that U2-dependent introns exhibit the 3' terminal dinucleotide AG. Intrigued by anecdotal evidence for functional non-AG 3' splice sites, we carried out a human genome-wide screen. RESULTS: We identified TG dinucleotides functioning as alternative 3' splice sites in 36 human genes. The TG-derived splice variants were experimentally validated with a success rate of 92%. Interestingly, ratios of alternative splice variants are tissue-specific for several introns. TG splice sites and their flanking intron sequences are substantially conserved between orthologous vertebrate genes, even between human and frog, indicating functional relevance. Remarkably, TG splice sites are exclusively found as alternative 3' splice sites, never as the sole 3' splice site for an intron, and we observed a distance constraint for TG-AG splice site tandems. CONCLUSION: Since TGs splice sites are exclusively found as alternative 3' splice sites, the U2 spliceosome apparently accomplishes perfect specificity for 3' AGs at an early splicing step, but may choose 3' TGs during later steps. Given the tiny fraction of TG 3' splice sites compared to the vast amount of non-viable TGs, cis-acting sequence signals must significantly contribute to splice site definition. Thus, we consider TG-AG 3' splice site tandems as promising subjects for studies on the mechanisms of 3' splice site selection.

Available:
pmid:17672918   BibTeX Entry ( Szafranski:Schindler:Taudien:Viola_the_splic:2007 )

Pre-mRNA Secondary Structures Influence Exon Recognition

Michael Hiller, Zhaiyi Zhang, Rolf Backofen, Stefan Stamm

In: PLoS Genet, 2007, 3(11), e204

The secondary structure of a pre-mRNA influences a number of processing steps including alternative splicing. Since most splicing regulatory proteins bind to single-stranded RNA, the sequestration of RNA into double strands could prevent their binding. Here, we analyzed the secondary structure context of experimentally determined splicing enhancer and silencer motifs in their natural pre-mRNA context. We found that these splicing motifs are significantly more single-stranded than controls. These findings were validated by transfection experiments, where the effect of enhancer or silencer motifs on exon skipping was much more pronounced in single-stranded conformation. We also found that the structural context of predicted splicing motifs is under selection, suggesting a general importance of secondary structures on splicing and adding another level of evolutionary constraints on pre-mRNAs. Our results explain the action of mutations that affect splicing and indicate that the structural context of splicing motifs is part of the mRNA splicing code.

Available:
pdf (482 KB)   doi:10.1371/journal.pgen.0030204   pmid:18020710   BibTeX Entry ( Hiller2007Sec )

RNAs everywhere: genome-wide annotation of structured RNAs

Athanasius F. Bompfünewerer Consortium, Rolf Backofen, Stephan H. Bernhart, Christoph Flamm, Claudia Fried, Guido Fritzsch, Jörg Hackermüller, Jana Hertel, Ivo L. Hofacker, Kristin Missal, Axel Mosig, Sonja J. Prohaska, Dominic Rose, Peter F. Stadler, Andrea Tanzer, Stefan Washietl, Sebastian Will

In: J Exp Zoolog B Mol Dev Evol, 2007, 308B(1), 1-25

Starting with the discovery of microRNAs and the advent of genome-wide transcriptomics, non-protein-coding transcripts have moved from a fringe topic to a central field research in molecular biology. In this contribution we review the state of the art of "computational RNomics&,uml; i.e., the bioinformatics approaches to genome-wide RNA annotation. Instead of rehashing results from recently published surveys in detail, we focus here on the open problem in the field, namely (functional) annotation of the plethora of putative RNAs. A series of exploratory studies are used to provide non-trivial examples for the discussion of some of the difficulties.

Available:
pdf (1210 KB)   doi:10.1002/jez.b.21130   pmid:17171697   BibTeX Entry ( BompfuenewererConsortium:RNAs_every_genom:2007 )

The Energy Landscape Library - A Platform for Generic Algorithms

Martin Mann, Sebastian Will, Rolf Backofen

In: BIRD'07 - 1st international Conference on Bioinformatics Research and Development, 2007, 217, 83-86

The study of energy landscapes of biopolymers and their models is an important field in bioinformatics. For instance the investigation of kinetics or folding simulations are done using methods that are based on sampling or exhaustive enumeration. Most of such algorithms are independent of the underlying landscape model. Therefore frameworks for generic algorithms to investigate the landscape properties is needed. Here, we present the Energy Landscape Library (ELL) that allows such a model-independent formulation of generic algorithms dealing with discrete states. The ELL is a completely object-oriented C++ library that is highly modular, easy to extend, and freely available online. It can be used for a fast and easy implementation of new generic algorithms (possibly based on the provided basic method pool) or as a framework to test their properties for different landscape models, which can be formulated straightforward.

Available:
pdf (85 KB)   BibTeX Entry ( Mann_ELL_BIRD07 )

BioBayesNet: a web server for feature extraction and Bayesian network modeling of biological sequence data

Swetlana Nikolajewa, Rainer Pudimat, Michael Hiller, Matthias Platzer, Rolf Backofen

In: Nucleic Acids Research, 2007, 35(Web Server issue), W688-93

BioBayesNet is a new web application that allows the easy modeling and classification of biological data using Bayesian networks. To learn Bayesian networks the user can either upload a set of annotated FASTA sequences or a set of pre-computed feature vectors. In case of FASTA sequences, the server is able to generate a wide range of sequence and structural features from the sequences. These features are used to learn Bayesian networks. An automatic feature selection procedure assists in selecting discriminative features, providing an (locally) optimal set of features. The output includes several quality measures of the overall network and individual features as well as a graphical representation of the network structure, which allows to explore dependencies between features. Finally, the learned Bayesian network or another uploaded network can be used to classify new data. BioBayesNet facilitates the use of Bayesian networks in biological sequences analysis and is flexible to support modeling and classification applications in various scientific fields. The BioBayesNet server is available at http://biwww3.informatik.uni-freiburg.de:8080/BioBayesNet/.

Available:
pdf (86 KB)   doi:10.1093/nar/gkm292   pmid:17537825   BibTeX Entry ( biobayesnet2007 )

TassDB: a database of alternative tandem splice sites

Michael Hiller, Swetlana Nikolajewa, Klaus Huse, Karol Szafranski, Philip Rosenstiel, Stefan Schuster, Rolf Backofen, Matthias Platzer

In: Nucleic Acids Research, 2007, 35(Database issue), D188-92

Subtle alternative splice events at tandem splice sites are frequent in eukaryotes and substantially increase the complexity of transcriptomes and proteomes. We have developed a relational database, TassDB (TAndem Splice Site DataBase), which stores extensive data about alternative splice events at GYNGYN donors and NAGNAG acceptors. These splice events are of subtle nature since they mostly result in the insertion/deletion of a single amino acid or the substitution of one amino acid by two others. Currently, TassDB contains 114 554 tandem splice sites of eight species, 5209 of which have EST/mRNA evidence for alternative splicing. In addition, human SNPs that affect NAGNAG acceptors are annotated. The database provides a user-friendly interface to search for specific genes or for genes containing tandem splice sites with specific features as well as the possibility to download large datasets. This database should facilitate further experimental studies and large-scale bioinformatics analyses of tandem splice sites. The database is available at http://helios.informatik.uni-freiburg.de/TassDB/.

Available:
pdf (378 KB)   doi:10.1093/nar/gkl762   pmid:17142241   BibTeX Entry ( Hiller:Nikolajewa:Huse:TassD_datab_alter:NAR2007 )

Fast Detection of Common Sequence Structure Patterns in RNAs

Rolf Backofen, Sven Siebert

In: Journal of Discrete Algorithms, 2007, 5(2), 212-228

We developed a dynamic programming approach for computing common exact sequential and structural patterns between two RNAs, given their sequences and their secondary structures. An RNA consists of a sequence of nucleotides and a secondary structure defined via bonds linking together complementary nucleotides. It is known that secondary structures are more preserved than sequences in the evolution of RNAs. We are able to compute all patterns between two RNAs in time O(nm) and space O(nm), where n and m are the lengths of the RNAs. Our method is useful for describing and detecting local motifs. It is especially suitable for finding similar regions of large RNAs that do not share global similarities. An implementation is available in C++ and can be obtained by contacting one of the authors.

Available:
ps.gz (262 KB)   pdf (507 KB)   BibTeX Entry ( Backofen:Siebert:Fast_Det_Pat:2007 )

Methods for multiple alignment and consensus structure prediction of RNAs implemented in MARNA

Sven Siebert, Rolf Backofen

In: Methods Mol Biol, 2007, 395, 489-502

Multiple alignments of RNAs are an essential prerequisite to further analyses such as homology modeling, motif description, or illustration of conserved or variable binding sites. Beyond the comparison of RNAs on the sequence level, structural conformations determined by basepairs have to be taken into account. Several pairwise sequence-structure alignment methods have been developed. They use extended alignment scores that evaluate secondary structure information in addition to sequence information. However, two problems for the multiple alignment step remain. First, how to combine pairwise sequence-structure alignments into a multiple alignment and, second, how to generate secondary structure information for sequences whose structural information is missing. Here, we describe MARNA, its underlying methods and its usage. MARNA is an approach for multiple alignment of RNAs taking into considerations both the primary sequences and the secondary structures. It relies on the pairwise sequence-structure comparison strategy by generating a set of weighted alignment edges. This set is processed by a consistency-based multiple alignment method. Additionally, MARNA extracts a consensus-sequence and structure from this generated multiple alignment. MARNA can be accessed via the http://www.bioinf.uni-freiburg.de/Software/MARNA.

Available:
pdf (167 KB)   pmid:17993694   BibTeX Entry ( Siebert:Backofen:Methods:2007 )

A dynamic programming approach for finding common patterns in RNAs

Sven Siebert, Rolf Backofen

In: Journal of Computational Biology, 2007, 14(1), 33-44

We developed a dynamic programming approach of computing common sequence structure patterns among two RNAs given their primary sequences and their secondary structures. Common patterns between two RNAs are defined to share the same local sequential and structural properties. The locality is based on the connections of nucleotides given by their phosphodiester and hydrogen bonds. The idea of interpreting secondary structures as chains of structure elements leads us to develop an efficient dynamic programming approach in time O(nm) and space O(nm), where n and m are the lengths of the RNAs. The biological motivation is given by detecting common, local regions of RNAs, although they do not necessarily share global sequential and structural properties. This might happen if RNAs fold into different structures but share a lot of local, stable regions. Here, we illustrate our algorithm on Hepatitis C virus internal ribosome entry sites. Our method is useful for detecting and describing local motifs as well. An implementation in C++ is available and can be obtained by contacting one of the authors.

Available:
pdf (3642 KB)   doi:10.1089/cmb.2006.0089   pmid:17381344   BibTeX Entry ( SiebertJCB07 )

A Bottom-up approach to Grid-Computing at a University: the Black-Forest-Grid Initiative

R. Backofen, H.-G. Borrmann, W. Deck, A. Dedner, L. De Raedt, K. Desch, M. Diesmann, M. Geier, A. Greiner, W. R. Hess, J. Honerkamp, St. Jamkowski, I. Krossing, A. W. Liehr, A. Karwath, R. Kloefkorn, R. Pesche, T. Potjans, M. C. Roettger, L. Schmiedt-Thieme, G. Schneider, B. Voss, B. Wiebelt, P. Wienemann, V.-H. Winterer

In: PIK, 2006, 29(2), 81-87

Recent years have seen a rapid increase in the need for highperformance computing. These demands come from disciplines such as particle physics traditionally relying on High Performance Computing (HPC) but lately also from the various branches of life science that have matured into quantitative disciplines. The classical infrastructure of university computer centres results to be unsuited to cope with the new requirements for a multitude of reasons. Here we discuss the causes of this failure and present a solution developed at the University of Freiburg in a collaborative effort of several faculties. We demonstrate that using state of the art grid computing technology the problem can now be addressed in a bottom-up approach. The organizational, technical, and financial components of our framework, the Black Forest Grid Initiative (BFG) are described and results of its implementation are presented. In the process, a number of new questions have emerged which the next phase of our project needs to address

Available:
pdf (175 KB)   BibTeX Entry ( backofen06:_bottom_up_approac_to_grid )

A Constraint-Based Approach to Fast and Exact Structure Prediction in Three-Dimensional Protein Models

Rolf Backofen, Sebastian Will

In: Journal of Constraints, January 2006, 11(1), 5--30

Simplified protein models are used for investigating general properties of proteins and principles of protein folding. Furthermore, they are suited for hierarchical approaches to protein structure prediction. A well known protein model is the HPmodel of Lau and Dill [33], which models the important aspect of hydrophobicity. One can define the HP-model for various lattices, among them two-dimensional and three-dimensional ones. Here, we investigate the three-dimensional case. The main motivation for studying simplified protein models is to be able to predict model structures much more quickly and more accurately than is possible for real proteins. However, up to now there was a dilemma: the algorithmically tractable, simple protein models can not model real protein structures with good quality and introduce strong artifacts.

We present a constraint-based method that largely improves this situation. It outperforms all existing approaches for lattice protein folding in HP-models. This approach is the first one that can be applied to two three-dimensional lattices, namely the cubic lattice and the face-centered-cubic (FCC ) lattice. Moreover, it is the only exact method for the FCC lattice. The ability to use the FCC lattice is a significant improvement over the cubic lattice. The key to our approach is the ability to compute maximally compact sets of points (used as hydrophobic cores), which we accomplish for the first time for the FCC lattice.

Keywords: protein structure prediction, HP-model, face-centered cubic lattice, constraint programming

Available:
pdf (517 KB)   ps.gz (382 KB)   doi:10.1007/s10601-006-6848-8   BibTeX Entry ( Backofen:Will:Constraints2006 )

Exploring the lower part of discrete polymer model energy landscapes

Michael T. Wolfinger, Sebastian Will, Ivo L. Hofacker, Rolf Backofen, Peter F. Stadler

In: Europhysics Letters, 2006, 74(4), 725-732

We present a generic, problem independent algorithm for exploration of the lowenergy portion of the energy landscape of discrete systems and apply it to the energy landscape of lattice proteins. Starting from a set of optimal and near-optimal conformations derived from a constraint-based search technique, we are able to selectively investigate the lower part of lattice protein energy landscapes in two and three dimensions. This novel approach allows, in contrast to exhaustive enumeration, for an eOEcient calculation of optimal and near-optimal structures below a given energy threshold and is only limited by the available amount of memory. A straightforward application of the algorithm is calculation of barrier trees (representing the energy landscape), which then allows dynamics studies based on landscape theory.

Available:
pdf (233 KB)   ps.gz (143 KB)   doi:10.1209/epl/i2005-10577-0   BibTeX Entry ( Wolfinger:etal:_explor:EPL2006 )

Counting Protein Structures by DFS with Dynamic Decomposition

Sebastian Will, Martin Mann

In: Proc. of the Workshop on Constraint Based Methods for Bioinformatics. http://www.dimi.uniud.it/dovier/WCB06/WCB06_proceedings.pdf, 2006, 83-90

We introduce depth-first search with dynamic decomposition for counting the solutions of a binary CSP completely. In particular, we use the method for computing the number of minimal energy structures for model proteins.

Available:
ps.gz (138 KB)   pdf (208 KB)   BibTeX Entry ( Will:Mann:WCB2006 )

Bioinformatics and Constraints

Rolf Backofen, David Gilbert

In: Francesca Rossi, Peter van Beek, Toby Walsh, Handbook of Constraint Programming, 2006, 905-944

No Abstract available

Available:
pdf (289 KB)   BibTeX Entry ( backofen06:_bioin_const )

INFO-RNA--a fast approach to inverse RNA folding

Anke Busch, Rolf Backofen

In: Bioinformatics, 2006, 22(15), 1823-31

MOTIVATION: The structure of RNA molecules is often crucial for their function. Therefore, secondary structure prediction has gained much interest. Here, we consider the inverse RNA folding problem, which means designing RNA sequences that fold into a given structure. RESULTS: We introduce a new algorithm for the inverse folding problem (INFO-RNA) that consists of two parts; a dynamic programming method for good initial sequences and a following improved stochastic local search that uses an effective neighbor selection method. During the initialization, we design a sequence that among all sequences adopts the given structure with the lowest possible energy. For the selection of neighbors during the search, we use a kind of look-ahead of one selection step applying an additional energy-based criterion. Afterwards, the pre-ordered neighbors are tested using the actual optimization criterion of minimizing the structure distance between the target structure and the mfe structure of the considered neighbor. We compared our algorithm to RNAinverse and RNA-SSD for artificial and biological test sets. Using INFO-RNA, we performed better than RNAinverse and in most cases, we gained better results than RNA-SSD, the probably best inverse RNA folding tool on the market. AVAILABILITY: www.bioinf.uni-freiburg.de?Subpages/software.html.

Available:
pdf (160 KB)   pmid:16709587   BibTeX Entry ( Busch:Backofen:INFO_fast_appro:2006 )

Single-Nucleotide Polymorphisms in NAGNAG Acceptors Are Highly Predictive for Variations of Alternative Splicing

Michael Hiller, Klaus Huse, Karol Szafranski, Niels Jahn, Jochen Hampe, Stefan Schreiber, Rolf Backofen, Matthias Platzer

In: Am J Hum Genet, 2006, 78(2), 291-302

Aberrant or modified splicing patterns of genes are causative for many human diseases. Therefore, the identification of genetic variations that cause changes in the splicing pattern of a gene is important. Elsewhere, we described the widespread occurrence of alternative splicing at NAGNAG acceptors. Here, we report a genomewide screen for single-nucleotide polymorphisms (SNPs) that affect such tandem acceptors. From 121 SNPs identified, we extracted 64 SNPs that most likely affect alternative NAGNAG splicing. We demonstrate that the NAGNAG motif is necessary and sufficient for this type of alternative splicing. The evolutionarily young NAGNAG alleles, as determined by the comparison with the chimpanzee genome, exhibit the same biases toward intron phase 1 and single-amino acid insertion/deletions that were already observed for all human NAGNAG acceptors. Since 28% of the NAGNAG SNPs occur in known disease genes, they represent preferable candidates for a more-detailed functional analysis, especially since the splice relevance for some of the coding SNPs is overlooked. Against the background of a general lack of methods for identifying splice-relevant SNPs, the presented approach is highly effective in the prediction of polymorphisms that are causal for variations in alternative splicing.

Available:
pdf (715 KB)   pmid:16400609   BibTeX Entry ( Hiller:Huse:Szafranski:Singl_Polym_NAGNA:2006 )

Phylogenetically widespread alternative splicing at unusual GYNGYN donors

M. Hiller, K. Huse, K. Szafranskzi, P. Rosenstiel, S. Schreiber, R. Backofen, M. Platzer

In: Genome Biol, 2006, 7(7), R65

ABSTRACT: BACKGROUND: Splice donor sites have a highly conserved GT or GC dinucleotide and an extended intronic consensus sequence GTRAGT that reflects the sequence complementarity to the U1 snRNA. Here, we focus on unusual donor sites with the motif GYNGYN (Y stands for C or T; N stands for A,C,G, or T). RESULTS: While only one GY functions as a splice donor for the majority of these splice sites in human, we provide computational and experimental evidence that 110 (1.3%) allow alternative splicing at both GY donors. The resulting splice forms differ in only three nucleotides which results mostly in the insertion/deletion of one amino acid. However, we also report the insertion of a stop codon in four cases. Investigating what distinguishes alternatively from not alternatively spliced GYNGYN donors, we found differences in the binding to U1 snRNA, a strong correlation between U1 snRNA binding strength and the preferred donor, overrepresented sequence motifs in the adjacent introns, and a higher conservation of the exonic and intronic flanks between human and mouse. Extending our genome-wide analysis to seven other eukaryotic species, we found alternatively spliced GYNGYN donors in all species from mouse to C. elegans and even in A. thaliana. Experimental verification of a conserved GTAGTT donor of the STAT3 gene in human and mouse reveals a remarkably similar ratio of alternatively spliced transcripts in both species. CONCLUSION: In contrast to alternative splicing in general, GYNGYN donors in addition to NAGNAG acceptors enable subtle protein variations.

Available:
pdf (615 KB)   pmid:16869967   BibTeX Entry ( Hiller:Huse:Szafranskzi:Phylo_wides_alter:2006 )

Using RNA secondary structures to guide sequence motif finding towards single-stranded regions

Michael Hiller, Rainer Pudimat, Anke Busch, Rolf Backofen

In: Nucleic Acids Research, 2006, 34(17), e117

RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.

Available:
pdf (421 KB)   doi:10.1093/nar/gkl544   pmid:16987907   BibTeX Entry ( Hiller:Pudimat:Busch:Using_RNA_secon:NAR2006 )

Alternative Splicing at NAGNAG Acceptors: Simply Noise or Noise and More?

Michael Hiller, Karol Szafranski, Rolf Backofen, Matthias Platzer

In: PLoS Genet, 2006, 2(11), e207

-

Available:
pdf (1226 KB)   pmid:17121470   BibTeX Entry ( Hiller:Szafranski:Backofen:Alter_Splic_NAGNA:2006 )

Sequencing errors or SNPs at splice-acceptor guanines in dbSNP?

Matthias Platzer, Michael Hiller, Karol Szafranski, Niels Jahn, Jochen Hampe, Stefan Schreiber, Rolf Backofen, Klaus Huse

In: Nat Biotechnol, 2006, 24(9), 1068-70

No Abstract available

Available:
pdf (642 KB)   pmid:16964207   BibTeX Entry ( Platzer:Hiller:Szafranski:Seque_error_SNPs:2006 )

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes

Rolf Backofen, Danny Hermelin, Gad M. Landau, Oren Weimann

In: Moshe Lewenstein, Gabriel Valiente, Proc. 17th Symp. Combinatorial Pattern Matching, Lecture Notes in Computer Science, 2006, 4009, 246-257

Local similarity is an important tool in comparative analysis of biological sequences, and is therefore well studied. In particular, the Smith-Waterman technique and its normalized version are two established metrics for measuring local similarity in strings. In RNA sequences however, where one must consider not only sequential but also structural features of the inspected molecules, the concept of local similarity becomes more complicated. First, even in global similarity, computing global sequence-structure alignments is more difficult than computing standard sequence alignments due to the bi-dimensionality of information. Second, one can view locality in two different ways, in the sequential or structural sense, leading to different problem formulations. In this paper we introduce two sequentially-local similarity metrics for comparing RNA sequences. These metrics combine the global RNA alignment metric of Shasha and Zhang [16] with the Smith-Waterman metric [17] and its normalized version [2] used in strings. We generalize the familiar alignment graph used in string comparison to apply also for RNA sequences, and then utilize this generalization to devise two algorithms for computing local similarity according to our two suggested metrics. Our algorithms run in O(m2n lg n) and O(m2n lg n+n2m) time respectively, where m n are the lengths of the two given RNAs. Both algorithms can work with any arbitrary scoring scheme.

Available:
pdf (257 KB)   BibTeX Entry ( backofen06:_local_align_rna_sequen_arbit_scorin_schem )

Efficient Constraint-based Sequence Alignment by Cluster Tree Elimination

Sebastian Will, Anke Busch, Rolf Backofen

In: Proceedings of the Workshop on Constraint Based Methods in Bioinformatics (WCB05), 2005, 66-74

Aligning DNA and protein sequences has become a standard mehtod in molecular biology. Often, it is desirable to include partial prior knowledge and conditions in an alignment. The most common and successful technique for efficient alignment algorithms is dynamic programming (DP). However, a weakness of DP is that one cannot include additional constraints without specifically tailoring a new DP algorithm. Here, we discuss a declarative approach that is based on constraint techniques and show how it can be extended by formulating additional knowledge as constraints. We take special care to obtain the efficiency of DP for sequence alignment. This is achieved by careful modeling and applying proper solving strategies.

Available:
ps.gz (405 KB)   BibTeX Entry ( Will:Busch:Backofen:_effic_const_sequen_align_clust_tree_elimin:WCB05 )

SECISDesign: a server to design SECIS-elements within the coding sequence

Anke Busch, Sebastian Will, Rolf Backofen

In: Bioinformatics, 2005, 21(15), 3312-3

SUMMARY: SECISDesign is a server for the design of SECIS-elements and arbitrary RNA-elements within the coding sequence of an mRNA. The element has to satisfy both structure and sequence constraints. At the same time, a certain amino acid similarity to the original protein has to be kept. The designed sequence can be used for recombinant expression of selenoproteins in Escherichia coli. AVAILABILITY: The server is available at http://www.bio.inf.uni-jena.de/Software/SECISDesign/index.html.

Available:
pdf (53 KB)   pmid:15919727   BibTeX Entry ( Busch:Will:Backofen:Bioinformatics2005 )

Normalized Similarity of RNA Sequences

Rolf Backofen, Danny Hermelin, Gad M. Landau, Oren Weimann

In: Proc. 12th Symposium on String Processing and Information Retrieval (SPIRE 2005), Lecture Notes in Computer Science, 2005, 3772, 360-369

No Abstract available

Available:
pdf (269 KB)   BibTeX Entry ( Backofen:_normal_simil_rna_sequen:SPIRE2005 )

Creation and disruption of protein features by alternative splicing -- a novel mechanism to modulate function

Michael Hiller, Klaus Huse, Matthias Platzer, Rolf Backofen

In: Genome Biol, 2005, 6(7), R58

BACKGROUND: Alternative splicing often occurs in the coding sequence and alters protein structure and function. It is mainly carried out in two ways: by skipping exons that encode a certain protein feature and by introducing a frameshift that changes the downstream protein sequence. These mechanisms are widespread and well investigated. RESULTS: Here, we propose an additional mechanism of alternative splicing to modulate protein function. This mechanism creates a protein feature by putting together two non-consecutive exons or destroys a feature by inserting an exon in its body. In contrast to other mechanisms, the individual parts of the feature are present in both splice variants but the feature is only functional in the splice form where both parts are merged. We provide evidence for this mechanism by performing a genome-wide search with four protein features: transmembrane helices, phosphorylation and glycosylation sites, and Pfam domains. CONCLUSION: We describe a novel type of event that creates or removes a protein feature by alternative splicing. Current data suggest that these events are rare. Besides the four features investigated here, this mechanism is conceivable for many other protein features, especially for small linear protein motifs. It is important for the characterization of functional differences of two splice forms and should be considered in genome-wide annotation efforts. Furthermore, it offers a novel strategy for ab initio prediction of alternative splice events.

Available:
pdf (218 KB)   pmid:15998447   BibTeX Entry ( Hiller:Huse:Platzer:Creat_and_disru:2005 )

Non-EST based prediction of exon skipping and intron retention events using Pfam information

Michael Hiller, Klaus Huse, Matthias Platzer, Rolf Backofen

In: Nucleic Acids Research, 2005, 33(17), 5611-21

Most of the known alternative splice events have been detected by the comparison of expressed sequence tags (ESTs) and cDNAs. However, not all splice events are represented in EST databases since ESTs have several biases. Therefore, non-EST based approaches are needed to extend our view of a transcriptome. Here, we describe a novel method for the ab initio prediction of alternative splice events that is solely based on the annotation of Pfam domains. Furthermore, we applied this approach in a genome-wide manner to all human RefSeq transcripts and predicted a total of 321 exon skipping and intron retention events. We show that this method is very reliable as 78% (250 of 321) of our predictions are confirmed by ESTs or cDNAs. Subsequent analyses of splice events within Pfam domains revealed a significant preference of alternative exon junctions to be located at the protein surface and to avoid secondary structure elements. Thus, splice events within Pfams are probable to alter the structure and function of a domain which makes them highly interesting for detailed biological investigation. As Pfam domains are annotated in many other species, our strategy to predict exon skipping and intron retention events might be important for species with a lower number of ESTs.

Available:
pdf (144 KB)   pmid:16204458   BibTeX Entry ( Hiller:Huse:Platzer:Non_based_predi:NAR2005 )

A multiple-feature framework for modelling and predicting transcription factor binding sites

Rainer Pudimat, E.G. Schukat-Talamazzini, Rolf Backofen

In: Bioinformatics, 2005, 21(14), 3082-8

Motivation: The identification of transcription factor binding sites in promoter sequences is an important problem, since it reveals information about the transcriptional regulation of genes. For analysing transcriptional regulation, computational approaches for predicting putative binding sites are applied. Commonly used stochastic models for binding sites are position specific score matrices (PSSM), which show weak predictive power. Results: We have developed a probabilistic modelling approach which allows to consider diverse characteristic binding site properties to obtain more accurate representations of binding sites. These properties are modelled as random variables in Bayesian networks, which are capable to deal with dependencies amongbinding site properties. Cross validation on several data sets shows improvements in the false positive error rate and the significance (p-value) of true binding sites. Availability: A more extensive description of validation results are available at http://www.bio.inf.uni-jena.de/Software/promapper/

Contact: backofen@inf.uni-jena.de

Available:
pdf (116 KB)   BibTeX Entry ( pudimat:2005:bioinformatics )

A new distance measure of RNA ensembles and its application to phylogenetic tree construction

Sven Siebert, Rolf Backofen

In: Gary B. Fogel, IEEE 2005 Symposium on Computational Intelligence in Bioinformatics and Computational Biology, November 2005, 150-157

A major challenge in RNA structure analysis is to infer common catalytic or regulatory functions based on primary sequences and secondary structures. Some programs have been developed that compare RNAs with such given structures. Nevertheless, the most important problem is that it is hard to determine the adopted structures of RNAs which are a necessary prerequisite to numerous applications; once a structure has been assigned to a sequence (e.g. the minimum free energy structure), it influences the output of the programs and thus affects the scientific result, especially when dealing with a set of multiple RNAs. In this paper, we go one step further and analyze distances between RNA structure ensembles. They reflect structural relationships computed basically on base-pairing probability matrices. We propose a distance measure between two base-pairing probability matrices showing similar or non-similar structural folding behaviour. This includes the detection of shared optimal, suboptimal and local secondary structures. Consequently, our distance measure avoids falling into the trap of fixing specific structures. A pairwise comparison strategy in a set of multiple RNAs leads us to construct a network of structural relationships using the neighbour joining method. Attempts to predict phylogenetic trees are discussed and demonstrated by means of viral RNAs.

Available:
BibTeX Entry ( SiebertBackofen:dist_meas:2005 )

MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons

Sven Siebert, Rolf Backofen

In: Bioinformatics, 2005, 21(16), 3352-9

MOTIVATION: Due to the importance of considering secondary structures in aligning functional RNAs, several pairwise sequence-structure alignment methods have been developed. They use extended alignment scores that evaluate secondary structure information in addition to sequence information. However, two problems for the multiple alignment step remain. First, how to combine pairwise sequence-structure alignments into a multiple alignment and second, how to generate secondary structure information for sequences whose explicit structural information is missing. RESULTS: We describe a novel approach for multiple alignment of RNAs (MARNA) taking into consideration both the primary and the secondary structures. It is based on pairwise sequence-structure comparisons of RNAs. From these sequence-structure alignments, libraries of weighted alignment edges are generated. The weights reflect the sequential and structural conservation. For sequences whose secondary structures are missing, the libraries are generated by sampling low energy conformations. The libraries are then processed by the T-Coffee system, which is a consistency based multiple alignment method. Furthermore, we are able to extract a consensus-sequence and -structure from a multiple alignment. We have successfully tested MARNA on several datasets taken from the Rfam database. AVAILABILITY: MARNA can be used online on our webpage www.bio.inf.uni-jena.de/Software/MARNA/index.html

Available:
pdf (585 KB)   pmid:15972285   BibTeX Entry ( Siebert:Backofen:MARNA_multi_align:2005 )

Local Sequence-Structure Motifs in RNA

Rolf Backofen, Sebastian Will

In: Journal of Bioinformatics and Computational Biology (JBCB), 2004, 2(4), 681-698

Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2m2max(n,m)) and a space complexity of only O(nm). An implementation of our algorithm is available at http://www.bio.inf.uni-jena.de. Its runtime is competitive with global sequence-structure alignment.

Keywords: RNA; local alignment; local sequence-structure alignment; lssa

Available:
ps.gz (99 KB)   pdf (161 KB)   pmid:15617161   BibTeX Entry ( Backofen:Will:_local_sequence_structure:JBCB2004 )

Constraint Based Protein Structure Prediction Exploiting Secondary Structure Information

Alessandro Dal Palu, Sebastian Will, Rolf Backofen, Agostino Dovier

In: Convegno Italiano di Logica Computazionale 2004 (CILC 2004), 2004, 16-17

The protein structure prediction problem is one of the most studied problems in Computational Biology. It can be reasonably abstracted as a minimization problem. The function to be minimized depends on the distances between the various amino-acids composing the protein and on their types. Even with strong approximations, the problem is shown to be computationally intractable. However, the solution of the problem for an arbitrary input size is not needed. Solutions for proteins of length 100-200 would give a strong contribution to Biotechnology. In this paper, we tackle the problem with constraint-based methods, using additional constraints and heuristics coming from the secondary structure of a protein that can be quickly predicted with acceptable approximation. Our prototypic implementation is written using constraints over finite domains in the Mozart programming system. It improves over any previous constraint-based approach and shows the power and flexibility of the method. Especially, it is well suited for further extensions.

Available:
pdf (304 KB)   BibTeX Entry ( DalPalu:etal:CILC2004 )

Computational Design of New and Recombinant Selenoproteins

Rolf Backofen, Anke Busch

In: Proc. of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM2004), Lecture Notes in Computer Science, 2004, 3109, 270-284

Selenoproteins contain the 21th amino acid Selenocysteine, which is encoded by the STOP-codon UGA. For its insertion it requires a specific mRNA sequence downstream the UGA-codon that forms a hairpin-like structure (called Sec insertion sequence (SECIS)). Selenoproteins have gained much interest recently since they are very important for human health. In contrast, very little is known about selenoproteins. For example, there is only one solved crystal structure available. One reason for this is that one is not able to produce enough amount of selenoproteins by using recombinant expression in a system like E.coli. The reason is that the insertion mechanisms are different between E.coli and eukaryotes. Thus, one has to redesign the human/mammalian selenoprotein for the expression in E.coli. In this paper, we introduce an polynomial-time algorithm for solving the computational problem involved in this design, and we present results for known selenoproteins.

Available:
ps.gz (175 KB)   BibTeX Entry ( Bac:Bus:CPM2004 )

A Polynomial Time Upper Bound for the Number of Contacts in the HP-Model on the Face-Centered-Cubic Lattice (FCC)

Rolf Backofen

In: Journal of Discrete Algorithms, 2004, 2(2), 161-206

Lattice protein models are a major tool for investigating principles of protein folding. For this purpose, one needs an algorithm that is guaranteed to find the minimal energy conformation in some lattice model (at least for some sequences). So far, there are only algorithm that can find optimal conformations in the cubic lattice. In the more interesting case of the face-centered-cubic lattice (FCC), which is more protein-like, there are no results. One of the reasons is that for finding optimal conformations, one usually applies a branch-and-bound technique, and there are no reasonable bounds known for the FCC. We will give such a bound for Dill's HP-model on the FCC, which can be calculated by a dynamic programming approach.

Available:
ps.gz (249 KB)   pdf (405 KB)   BibTeX Entry ( backofen03:_polyn_time_upper_bound_number )

Fast Detection of Common Sequence Structure Patterns in RNAs

Rolf Backofen, Sven Siebert

In: Symposium on String Processing and Information Retrieval (SPIRE 2004), Lecture Notes in Computer Science, 2004, 3246, 79-92

We developed a dynamic programming approach of computing common sequence/structure patterns between two RNAs given by their sequence and secondary structures. Common patterns between two RNAs are meant to share the same local sequential and structural properties. Nucleotides which are part of an RNA are linked together due to their phosphodiester or hydrogen bonds. These bonds describe the way how nucleotides are involved in patterns and thus delivers a bond-preserving matching definition. Based on this definition, we are able to compute all patterns between two RNAs in time O(nm) and space O(nm), where n and m are the lengths of the RNAs, respectively. Our method is useful for describing and detecting local motifs and for detecting local regions of large RNAs although they do not share global similarities.

Available:
pdf (345 KB)   ps.gz (154 KB)   BibTeX Entry ( backofensiebert:2004:gcb )

Efficient prediction of alternative splice forms using protein domain homology

Michael Hiller, Rolf Backofen, Stephan Heymann, Anke Busch, Timo Mika Glaesser, Johann-Christoph Freytag

In: In Silico Biol, 2004, 4(2), 0017

Alternative splicing can yield manifold different mature mRNAs from one precursor. New findings indicate that alternative splicing occurs much more often than previously assumed. A major goal of functional genomics lies in elucidating and characterizing the entire spectrum of alternative splice forms. Existing approaches such as EST-alignments focus only on the mRNA sequence to detect alternative splice forms. They do not consider function and characteristics of the resulting proteins. One important example of such functional characterization is homology to a known protein domain family. A powerful description of protein domains are profile Hidden Markov models (HMM) as stored in the Pfam database. In this paper we address the problem of identifying the splice form with the highest similarity to a protein domain family. Therefore, we take into consideration all possible splice forms. As demonstrated here for a number of genes, this homology based approach can be used successfully for predicting partial gene structures. Furthermore, we present some novel splice form predictions with high-scoring protein domain homology and point out that the detection of splice form specific protein domains helps to answer questions concerning hereditary diseases. Simple approaches based on a BLASTP search cannot be applied here, since the number of possible splice forms increases exponentially with the number of exons. To this end, we have developed an efficient polynomial-time algorithm, called ASFPred (Alternative Splice Form Prediction). This algorithm needs only a set of exons as input.

Available:
ps.gz (132 KB)   pdf (314 KB)   pmid:15107023   BibTeX Entry ( Hiller:Backofen:Heymann:Effic_predi_alter:2004 )

Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity

Michael Hiller, Klaus Huse, Karol Szafranski, Niels Jahn, Jochen Hampe, Stefan Schreiber, Rolf Backofen, Matthias Platzer

In: Nat Genet, 2004, 36(12), 1255-7

Splice acceptors with the genomic NAGNAG motif may cause NAG insertion-deletions in transcripts, occur in 30% of human genes and are functional in at least 5% of human genes. We found five significant biases indicating that their distribution is nonrandom and that they are evolutionarily conserved and tissue-specific. Because of their subtle effects on mRNA and protein structures, these splice acceptors are often overlooked or underestimated, but they may have a great impact on biology and disease.

Available:
pdf (160 KB)   pmid:15516930   BibTeX Entry ( Hiller:Huse:Szafranski:Wides_occur_alter:2004 )

Feature Based Representation and Detection of Transcription Factor Binding Sites

Rainer Pudimat, E. G. Schukat-Talamazzini, Rolf Backofen

In: Robert Giegerich, Jens Stoye, German Conference on Bioinformatics, LNI, 2004, 53, 43-52

The prediction of transcription factor binding sites is an important problem, since it reveals information about the transcriptional regulation of genes. A commonly used representation of these sites are position specific weight matrices which show weak predictive power. We introduce a feature-based modelling approach, which is able to deal with various kind of biological properties of binding sites and models them via em Bayesian belief networks. The presented results imply higher model accuracy in contrast to the PSSM approach.

Available:
pdf (305 KB)   BibTeX Entry ( pudimat:2004:gcb )

A Constraint-Based Approach to Structure Prediction for Simplified Protein Models that Outperforms Other Existing Methods

Rolf Backofen, Sebastian Will

In: Proceedings of the 19th International Conference on Logic Programming (ICLP 2003), Lecture Notes in Computer Science, 2003, 2916, 49--71

Lattice protein models are used for hierarchical approaches to protein structure prediction, as well as for investigating general principles of protein folding. So far, one has the problem that either the lattice does not model real protein conformations with good quality, or there is no efficient method known for finding native conformations. We present a constraint-based method that largely improves this situ- ation. It outperforms all existing approaches in lattice protein folding on the type of model we have chosen (namely the HP-model by Lau and Dill[34], which models the important aspect of hydrophobicity). It is the only exact method that has been applied to two different lattices. Furthermore, it is the only exact method for the face-centered cubic lattice. This lattice is important since it has been shown [38] that the FCC lattice can model real protein conformations with coordinate root mean square deviation below 2 Angstrom. Our method uses a constraint-based approach. It works by first calculating maximally compact sets of points (hydrophobic cores), and then threading the given HP-sequence to the hydrophobic cores such that the core is occupied by H-monomers.

Available:
ps.gz (299 KB)   BibTeX Entry ( Backofen:Will:_const_based_approac_struc_predic:ICLP2003 )

Breaking of Partial Symmetries in the Photo and Alignment Problem

Sebastian Will, Rolf Backofen

In: Barbara Smith, Ian Gent, Warwick Harvey, Proceedings of the Third International Workshop on Symmetry in Constraint Satisfaction Problems (SymCon 2003), 2003, 187--194

Symmetry breaking by adding symmetric constraints during search usually assumes that symmetric constraints are simple. We identify symmetries with complex symmetric constraints, where the symmetries nevertheless can be handled by a similar method. For this aim, we introduce partial symmetries. We identify those symmetries in two problems. The photo problem is a well known example problem, while the alignment problem is a real world problem from bioinformatics.

Available:
ps.gz (61 KB)   BibTeX Entry ( Will:Backofen:SymCon2003 )

MARNA: A Server for Multiple Alignment of RNAs

Sven Siebert, Rolf Backofen

In: German Conference on Bioinformatics, October 2003, 135-140

We describe an algorithmic method for multiple alignment of RNAs taking into consideration both the primary sequence and the secondary structure. In a first step, alignment edges between nucleotides are produced by pairwise sequence/structure comparisons of RNAs. They are weighted with similarity scores transformed from edit distances as proposed by Zhang. In a next step, the set of alignment edges are given as input to the multiple alignment program T-COFFEE. Here, edges that are supported by several pairwise alignments are strengthened. An example of tRNA sequences from the Rfam database with their typical cloverleaf structure is given. In addition, we compare this alignment with a traditional multiple sequence alignment program: clustalw. The MARNA server is available at http://www.bio.inf.uni-jena.de/Software/MARNA/marna.html

Available:
BibTeX Entry ( SiebertGCB03 )

Excluding symmetries in constraint-based search

Rolf Backofen, Sebastian Will

In: Constraints, 2002, 7(3), 333-349

We introduce a new method for excluding symmetries in constraint based search. To our knowledge, it is the first declarative method that can be applied to arbitrary symmetries. Our method is based on the notion of symmetric constraints, which are used in our modification of a general constraint based search algorithm. The method does not influence the search strategy. Furthermore, it can be used with either the full set of symmetries, or with an subset of all symmetries. We proof correctness, completeness and symmetry exclusion properties of our method. Then, we show how to apply the method in the special case of geometric symmetries (rotations and reflections) and permutation symmetries. Furthermore, we give results from practical applications.

Available:
pdf (282 KB)   ps.gz (110 KB)   BibTeX Entry ( Bac:Wil:Constraints2002 )

Constraint-based Hydrophobic Core Construction for Protein Structure Prediction in the Face-Centered-Cubic Lattice

Sebastian Will

In: Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Teri E. Klein, Proceedings of the Pacific Symposium on Biocomputing 2002 (PSB 2002), 2002, 661--672

We present an algorithm for exact protein structure prediction in the FCC-HP-model. This model is a lattice protein model on the face-centered-cubic lattice that models the main force of protein folding, namely the hydrophobic force. The structure prediction for this model can be based on the construction of hydrophobic cores. The main focus of the paper is on an algorithm for constructing maximally and submaximally compact hydrophobic cores of a given size. This algorithm treats core construction as a constraint satisfaction problem (CSP), and the paper describes its constraint model. The algorithm employs symmetry excluding constraint-based search [Backofen, Will: CP99] and relies heavily on good upper bounds on the number of contacts. Here, we use and strengthen upper bounds presented earlier. [Backofen, Will: CPM2001] The resulting structure prediction algorithm (including previous work [Backofen, Will: CPM2001, Backofen, Will: CP2001]) handles sequences of sizes in the range of real proteins fast, i.e. we predict a first structure often within a few minutes. The algorithm is the first exact one for the FCC, besides full enumeration which is impracticable for chain lengths greater than about 15. We tested the algorithm succesfully up to sequence length of 160, which is far beyond the capabilities even of previous heuristic approaches.

Available:
ps.gz (82 KB)   pmid:11928518   BibTeX Entry ( Will:PSB2002 )
Links:
supplementary, there is a Poster (PostScript) on the topic

Protein similarity search under mRNA structural constraints: application to targeted selenocysteine insertion

Rolf Backofen, N. S. Narayanaswamy, Firas Swidan

In: In Silico Biology, 2002, 2(3), 275-90

Selenocysteine is the 21th amino acid, which occurs in all kingdoms of life. Selenocysteine is encoded by the STOP-codon UGA. For its insertion, it requires a specific mRNA sequence downstream the UGA-codon that forms a hairpin like structure (called Sec insertion sequence (SECIS)). We consider the computational problem of generating new amino acid sequences containing selenocysteine. This requires to find an mRNA sequence that is similar to the SECIS-consensus, is able to form the secondary structure required for selenocysteine insertion, and whose translation is maximally similar to the original amino acid sequence. We show that the problem can be solved in linear time when considering the hairpin-like SECIS-structure (and, more generally, when considering a structure that does not contain pseudoknots).

Available:
pdf (160 KB)   pmid:12542413   BibTeX Entry ( Backofen:Narayanaswamy:Swidan:Prote_simil_searc:2002 )

Optimally Compact Finite Sphere Packings --- Hydrophobic Cores in the FCC

Rolf Backofen, Sebastian Will

In: Proc. of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM2001), Lecture Notes in Computer Science, 2001, 2089, 257--272

Lattice protein models are used for hierarchical approaches to protein structure prediction, as well as for investigating principles of protein folding. The problem is that there is so far no known lattice that can model real protein conformations with good quality, and for which there is an efficient method to prove whether a conformation found by some heuristic algorithm is optimal. We present such a method for the FCC-HP-Model [Agarwala et al., JCB 97]. For the FCC-HP-Model, we need to find conformations with a maximally compact hydrophobic core. Our method allows us to enumerate maximally compact hydrophobic cores for sufficiently great number of hydrophobic amino-acids. We have used our method to prove the optimality of heuristically predicted structures for HP-sequences in the FCC-HP-model.

Available:
pdf (274 KB)   ps.gz (96 KB)   BibTeX Entry ( Bac:Wil:CPM2001 )

Fast, Constraint-based Threading of HP-Sequences to Hydrophobic Cores

Rolf Backofen, Sebastian Will

In: Proc. of the 7th International Conference on Principle and Practice of Constraint Programming (CP'2001), Lecture Notes in Computer Science, 2001, 2239, 494--508

Lattice protein models are used for hierarchical approaches to protein structure prediction, as well as for investigating principles of protein folding. So far, one has the problem that there exists no lattice that can model real protein conformations with good quality and for which an efficient method to find native conformations is known. We present the first method for the FCC-HP-Model [Agarwala et al., JCB 97] that is capable of finding native conformations for real-sized HP-sequences. It has been shown [Park and Levitt, JMB 95] that the FCC lattice can model real protein conformations with coordinate root mean square deviation below 2 \AA. Our method uses a constraint-based approach. It works by first calculating maximally compact sets of points (hydrophobic cores), and then threading the given HP-sequence to the hydrophobic cores such that the core is occupied by H-monomers.

Available:
ps.gz (89 KB)   pdf (238 KB)   BibTeX Entry ( Bac:Wil:CP2001 )

The Protein Structure Prediction Problem: A Constraint Optimisation Approach using a New Lower Bound

Rolf Backofen

In: Constraints, 2001, 6, 223-255

The protein structure prediction problem is one of the most (if not the most) important problem in computational biology. This problem consists of finding the conformation of a protein with minimal energy. Because of the complexity of this problem, simplified models like Dill's HP-lattice model [15], [16] have become a major tool for investigating general properties of protein folding. Even for this simplified model, the structure prediction problem has been shown to be NP-complete [5], [7]. We describe a constraint formulation of the HP-model structure prediction problem, and present the basic constraints and search strategy. Of course, the simple formulation would not lead to an efficient algorithm. We therefore describe redundant constraints to prune the search tree. Furthermore, we need bounding function for the energy of an HP-protein. We introduce a new lower bound based on partial knowledge about the final conformation (namely the distribution of H-monomers to layers). Keywords: structure prediction, protein folding, lattice models, HP-model

Available:
Preprint.ps.gz (223 KB)   pdf (229 KB)   BibTeX Entry ( Backofen:Constraints2001 )

RNA-Sequence-Structure Properties and Selenocysteine Insertion

Rolf Backofen

In: Frank Hoffmann, David J. Hand, Niall Adams, Douglas Fisher, Gabriela Guimaraes, Proc. of the Fourth International Symposium on Intelligent Data Analysis, Lecture Notes in Computer Science, 2001, 2189, 187--197

Selenocysteine (Sec) is a recently discovered 21st amino acid. Selenocysteine is encoded by the nucleotide triplet UGA, which is usually interpreted as a STOP signal. The insertion of selenocysteine requires an additional signal in form of a SECIS-element (\textbfSeleno\textbfcysteine \textbfInsertion \textbfSequence). This is an mRNA-motif, which is defined by both sequence-related and structure-related properties. The bioinformatics problem of interest is to design new selenoproteins (i.e., proteins containing selenocysteine), since seleno variants of proteins are very useful in structure analysis. When designing new bacterial selenoproteins, one encounters the problem that the SECIS-element itself is translated (and therefore encodes a subsequence of the complete protein). Hence, changes on the level of mRNA made in order to generate a SECIS-element will also modify the amino acid sequence. Thus, one searches for an mRNA that is maximally similar to a SECIS-element, and for which the encoded amino acid sequence is maximally similar to the original amino acid sequence. In addition, it must satisfy the constraints imposed by the structure. Though the problem is NP-complete if arbitrary structural constraints are allowed, it can be solved efficiently when we consider the structures as used in SECIS-elements. The remaining problem is to generate a description of the SECIS-element (and its diversity) based on the available data.

Available:
ps.gz (99 KB)   BibTeX Entry ( Backofen:IDA2001 )

Protein similarity search under mRNA structural constraints: application to selenocysteine incorporation

Rolf Backofen, N.S. Narayanaswamy, Firas Swidan

In: Edgar Wingender, Proc. of German Conference on Bioinformatics (GCB2001), 2001, 135-140

Selenocysteine is the recently found 21th amino acid, which occurs in all kingdoms of life. Selenocysteine is encoded by the STOP-codon UGA. For its insertion, it requires a specific mRNA sequence downstream the UGA-codon that forms a hairpin like structure (called Sec insertion sequence (SECIS)). We consider the computational problem of generating new amino acid sequences containing selenocysteine. This requires to find an mRNA sequence that is similar to the SECIS-consensus, that is able to form the secondary structure required for selenocysteine insertion, and whose translation is maximally similar to the original amino acid sequence. We show that the problem can be solved in linear time when the structure does not contain pseudoknots.

Available:
ps.gz (88 KB)   BibTeX Entry ( Bac:Nar:Swi:GCB01 )

Introduction to the Special Issue on Bioinformatics

David Gilbert, Rolf Backofen, Roland H. C. Yap

In: Constraints, 2001, 6(2/3), 139

No Abstract available

Available:
pdf (25 KB)   BibTeX Entry ( Gilbert:Backofen:Yap:_intro:Constraints2001 )

Algorithmic approach to quantifying the hydrophobic force contribution in protein folding

Rolf Backofen, Sebastian Will, Peter Clote

In: Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Teri E. Klein, Proceedings of the Pacific Symposium on Biocomputing (PSB 2000), 2000, 5, 92--103

Though the electrostatic, ionic, van der Waals, Lennard-Jones, hydrogen bonding, and other forces play an important role in the energy function minimized at a protein's native state, it is widely believed that the \em hydrophobic force is the dominant term in protein folding. In this paper, we attempt to quantify the extent to which the hydrophobic force determines the positions of the backbone α-carbon atoms in PDB data, by applying Monte-Carlo and genetic algorithms to determine the predicted conformation with minimum energy, where only the hydrophobic force is considered (i.e. Dill's HP-model, and refinements using Woese's polar requirement). This is done by computing the root mean square deviation between the normalized distance matrix D = (di,j) (di,j is normalized Euclidean distance between residues ri and rj) for PDB data with that obtained from the output of our algorithms. Our program was run on the database of ancient conserved regions drawn from GenBank 101 generously supplied by W. Gilbert's lab, as well as medium-sized proteins (E. Coli RecA, 2reb, Erythrocruorin, 1eca, and Actinidin 2act). The root mean square deviation (RMSD) between distance matrices derived from the PDB data and from our program output is quite small, and by comparison with RMSD between PDB data and random coils, allows a quantification of the hydrophobic force contribution Keywords: lattice, face-centered-cubic, hydrophobic force, automorphism

Available:
ps.gz (107 KB)   BibTeX Entry ( Bac:Wil:Clo:PSB2000 )

Classroom Assignment using Constraint Logic Programming

Slim Abdennadher, Matthias Saft, Sebastian Will

In: Proceedings of the Second International Conference and Exhibition on The Practical Application of Constraint Technologies and Logic Programming (PACLP 2000), 2000

The Classroom Assignment problem consists of scheduling a set of courses into a fixed number of rooms, given a fixed timetable. At the University of Munich, a classroom plan has to be created each semester after collecting timetables of all departments and wishes of teachers. This planning is very complex since a lot of constraints have to be met, e.g. room size and equipment. Using constraint-based programming, we developed a prototype, called RoomPlan, that supports automatic creation of classroom plans. With RoomPlan a schedule can be created interactively within some minutes instead of some days. RoomPlan was presented at the Systems'99 Computer exhibition in Munich.

Available:
ps.gz (137 KB)   BibTeX Entry ( Abdennadher:Saft:Will:PACLP00 )

An Upper Bound for Number of Contacts in the HP-Model on the Face-Centered-Cubic Lattice (FCC)

Rolf Backofen

In: R. Giancarlo, D. Sankoff, Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching (CPM 2000), Lecture Notes in Computer Science, 2000, 1848, 277--292

Lattice protein models are a major tool for investigating principles of protein folding. For this purpose, one needs an algorithm that is guaranteed to find the minimal energy conformation in some lattice model (at least for some sequences). So far, there are only algorithm that can find optimal conformations in the cubic lattice. In the more interesting case of the face-centered-cubic lattice (FCC), which is more protein-like, there are no results. One of the reasons is that for finding optimal conformations, one usually applies a branch-and-bound technique, and there are no reasonable bounds known for the FCC. We will give such a bound for Dill's HP-model on the FCC.

Available:
ps.gz (99 KB)   BibTeX Entry ( Backofen:CPM2000 )

Excluding Symmetries in Constraint-Based Search

Rolf Backofen, Sebastian Will

In: Joxan Jaffar, Proceedings of 5\it th International Conference on Principle and Practice of Constraint Programming (CP'99), Lecture Notes in Computer Science, 1999, 1713, 73-87

We introduce a new method for excluding symmetries in constraint based search. To our knowledge, it is the first declarative method that can be applied to arbitrary symmetries. Our method is based on the notion of symmetric constraints, which are used in our modification of a general constraint based search algorithm. The method does not influence the search strategy. Furthermore, it can be used with either the full set of symmetries, or with an subset of all symmetries. We proof correctness, completeness and symmetry exclusion properties of our method. We then show how to apply the method in the special case of geometric symmetries (rotations and reflections) and permutation symmetries. Furthermore, we give results from practical applications.

Available:
ps.gz (97 KB)   BibTeX Entry ( Bac:Wil:CP99 )

Application of Constraint Programming Techniques for Structure Prediction of Lattice Proteins with Extended Alphabets

Rolf Backofen, Sebastian Will, Erich Bornberg-Bauer

In: Bioinformatics, 1999, 15(3), 234-242

Predicting the ground state of biopolymers is a notoriously hard problem in biocomputing. Model systems, such as lattice proteins are simple tools and valuable to test and improve new methods. Best known are HP-type models with sequences composed from a binary (hydrophobic and polar) alphabet. Major drawback is the degeneracy, i.e. the number of different ground state conformations. Here we show how recently developed constraint programming techniques can be used to solve the structure prediction problem efficiently for a higher order alphabet. To our knowledge it is the first report of an exact and computationally feasible solution to model proteins of length up to 36 and without resorting to maximally compact states. We further show that degeneracy is reduced by more than one order of magnitude and that ground state conformations are not necessarily compact. Therefore, more realistic protein simulations become feasible with our model.

Available:
pdf (296 KB)   ps.gz (210 KB)   BibTeX Entry ( Bac:Wil:Bor:BioInfo99 )

Algorithmic approach to quantifying the hydrophobic force contribution in protein folding

Rolf Backofen, Sebastian Will, Peter Clote

In: Proceedings of the German Conference on Bioinformatics (GCB'99), 1999, 93-106

Though the electrostatic, ionic, van der Waals, Lennard-Jones, hydrogen bonding, and other forces play an important role in the energy function minimized at a protein's native state, it is widely believed that the \em hydrophobic force is the dominant term in protein folding. In this paper, we attempt to quantify the extent to which the hydrophobic force determines the positions of the backbone α-carbon atoms in PDB data, by applying Monte-Carlo and genetic algorithms to determine the predicted conformation with minimum energy, where only the hydrophobic force is considered (i.e. Dill's HP-model, and refinements using Woese's polar requirement). This is done by computing the root mean square deviation between the normalized distance matrix D = (di,j) (di,j is normalized Euclidean distance between residues ri and rj) for PDB data with that obtained from the output of our algorithms. Our program was run on the database of ancient conserved regions drawn from GenBank 101 generously supplied by W. Gilbert's lab, as well as medium-sized proteins (E. Coli RecA, 2reb, Erythrocruorin, 1eca, and Actinidin 2act). The root mean square deviation (RMSD) between distance matrices derived from the PDB data and from our program output is quite small, and by comparison with RMSD between PDB data and random coils, allows a quantification of the hydrophobic force contribution The final version of this paper will appear in the proceedings of PSB'2000. Keywords: lattice, face-centered-cubic, hydrophobic force, automorphism

Publication note:
http://www.bioinfo.de/isb/gcb99

Available:
ps.gz (107 KB)   BibTeX Entry ( Bac:Wil:Clo:GCB99 )

Aktuelles Schlagwort/Bioinformatik

Rolf Backofen, François Bry, Peter Clote, Hans-Peter Kriegel, Thomas Seidl, Klaus Schulz

In: Informatik Spektrum, 1999, 22(5), 376-378

Die Bioinformatik tangiert einerseits die Molekularbiologie, die Biochemie und die Genetik, andererseits die Theoretische und Praktische Informatik und die Computerlinguistik. Sie verfügt über einen homogenen und breiten Bestand an offenen Problemen. Sie gewinnt immer mehr an Bedeutung in Biologie und Genetik und wird schon industriell eingesetzt.

Available:
BibTeX Entry ( Bac-et-al:Informatik-Spektrum99 )
Links:
PDF (54KB)

Structure Prediction in an HP-type Lattice with an Extended Alphabet

Rolf Backofen, Sebastian Will

In: Proc of German Conference on Bioinformatics (GCB'98), 1998

The protein structure prediction problem is one of the most important problems in computational biology. This problem consists of finding the conformation of a protein (given by a sequence of amino-acids) with minimal energy. Because of the complexity of this problem, simplified models like Dill's HP-lattice model have become a major tool for investigating general properties of protein folding. Even for this simplified model, the structure prediction problem has been proven to be NP-complete. A disadvantage of the HP-problem is its high degeneracy. I.e., for every sequence there are a lot of conformations having the minimal energy. For this reason, extended alphabets have been used in the literature. One of these alphabets is the HPNX-alphabet, which considers hydrophobic amino acids as well as positive and negative charged ones. In this paper, we describe an exact algorithm for solving the structure prediction problem for the HPNX-alphabet. To our knowledge, our algorithm is the first exact one for finding the minimal conformation of an lattice protein in a lattice model with an alphabet more complex than HP. We also compare our results with results as given for the HP-model.

Available:
pdf (231 KB)   ps.gz (106 KB)   BibTeX Entry ( Bac:Wil:GCB98 )
Links:
long version as Technical Report 9810, LMU München

Excluding Symmetries in Concurrent Constraint Programming

Rolf Backofen, Sebastian Will

In: Workshop on Modeling and Computing with Concurrent Constraint Programming, 1998

In many problems, non-determinism naturally arises in an concurrent constraint-based formulation in form of disjunctive constraints. To check consistency of disjunctive constraints, one needs to perform search by splitting the constraint store into local computation stores. An important problem there is to eliminate symmetries, which give rise to many different but ''similar'' solutions found by the search procedure. Since symmetries will give rise to an (often exponential) amplification of the search space, symmetry exclusion promises to efficiently prune the search tree. Hence, many constraint programmers have constructed mechanisms to exclude at least some of the symmetries of their problem. Unfortunately, symmetry exclusion was often not straightforward to implement and had to be redesigned for every special problem or enumeration strategy, which leaded to an inflexibility in the program structure once it was introduced. Often symmetry exclusion was not done at all or only for a small set of symmetries. In other cases, where symmetry exclusion was implemented it distracted the programmers attention from more important tasks. The contribution of our work is to give a generally applicable method for the exclusion of arbitrary symmetries. To our knowledge, this is the first general and declarative method for excluding symmetries in concurrent constraint programming.

Available:
ps.gz (72 KB)   BibTeX Entry ( Bac:Wil:WS-CP98 )

Constraint Techniques for Solving the Protein Structure Prediction Problem

Rolf Backofen

In: Michael Maher, Jean-Francois Puget, Proceedings of 4\it th International Conference on Principle and Practice of Constraint Programming (CP'98), Lecture Notes in Computer Science, 1998, 1520, 72-86

The protein structure prediction problem is one of the most (if not \em the most) important problem in computational biology. This problem consists of finding the conformation of a protein (i.e., a sequence of amino-acids) with minimal energy. Because of the complexity of this problem, simplified models like Dill's HP-lattice model have become a major tool for investigating general properties of protein folding. Even for this simplified model, the structure prediction problem has been shown to be NP-complete. We describe a constraint formulation of the HP-model structure prediction problem, present the basic constraints and search strategy. We then introduce a novel, general technique for excluding geometrical symmetries in constraint programming. To our knowledge, this is the first general and declarative technique for excluding symmetries in constraint programming that can be added to an existing implementation. Finally, we describe a new lower bound on the energy of an HP-protein. Both techniques yield an efficient pruning of the search tree.

Available:
ps.gz (108 KB)   BibTeX Entry ( Backofen:CP98 )

Using Constraint Programming for lattice Protein Folding

Rolf Backofen

In: Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Teri E. Klein, Proceedings of the Pacific Symposium on Biocomputing (PSB'98), 1998, 3, 387-398

We present a global search technique for finding the minimal conformation of a sequence in Dill's HP-lattice model [Lau:89a,Lau:90a]. The HP-lattice model is a simplified model of proteins, that has become a major tool for investigating general properties of protein folding. The search technique uses constraint programming for efficiently pruning the search tree. We state the problem of structure prediction in the HP-lattice model and describe our implementation using the Oz-system.

Available:
ps.gz (48 KB)   BibTeX Entry ( Backofen:PSB98 )

How to Win a Game with Features

Rolf Backofen, Ralf Treinen

In: Information and Computation, April 1998, 142(1), 76--101

We show, that the axiomatization of rational trees in the language of features given by Smolka & Treinen 1992 is complete. In contrast to other completeness proofs that have been given in this field, we employ the method of Ehrenfeucht-Fraisse Games. The result extends previous results on complete axiomatizations of rational trees in the language of constructor equations (Comon & Lescanne 1988, Maher 1988) or in a weaker feature language as given by Backofen & Smolka 1993.

Available:
ps.gz (91 KB)   BibTeX Entry ( BT:CFTgames94ccl )

Evolution as a Computational Engine

Rolf Backofen, Peter Clote

In: Mogens Nielsen, Wolfgang Thomas, Proc. of Annual Conference of the European Association for Computer Science Logic (CSL'97), Lecture Notes in Computer Science, 1997, 1414, 35--55

Given a finite set K of lethal genes, a starting genome x and a desired target genome y, is there a sequence of base insertions, deletions, and substitutions, which from x produces the desired genome y, and such that no intermediate genome contains a lethal gene. The main result of this paper is that, appropriately formulated, this problem is undecidable, even when the underlying alphabet has only 2 symbols (e.g. two of the bases A,C,G,T). We prove that asexual evolutionary systems form a universal computation engine, in the sense that for any reversible Turing machine M and input x, there is a finite set K of lethal genes, which can guide the computation, so that all intermediate genomes are prefixes of (encodings of) configurations in the computation of M. Since reversible Turing machines retain the history of their entire computation, and are essentially encoded by evolutionary systems, this suggests a encoding mechanism by which phylogeny recapitulates ontogeny.

Available:
ps.gz (207 KB)   BibTeX Entry ( Bac:Clo:CSL97 )

Using Constraint Programming for lattice Protein Folding

Rolf Backofen

In: Workshop on Constraints and Bioinformatics/Biocomputing, 1997

We present a global search technique for finding the minimal conformation of a sequence in Dill's HP-lattice model [Lau:89a,Lau:90a]. The HP-lattice model is a simplified model of proteins, that has become a major tool for investigating general properties of protein folding. The search technique uses constraint programming for efficiently pruning the search tree. We state the problem of structure prediction in the HP-lattice model and describe our implementation using the Oz-system.

Publication note:
Held in conjunction with \em Third International Conference on Principles and Practice of Constraint Programming (CP97)

Available:
ps.gz (58 KB)   BibTeX Entry ( Backofen:CP97-WS )

Controlling Functional Uncertainty

Rolf Backofen

In: Wolfgang Wahlster, Proceedings of 12\it th European Conference on Artificial Intelligence, 1996, 557--561

There have been two different methods for checking the satisfiability of feature descriptions using the functional uncertainty device, namely [Kaplan/Maxwell88] and [Backofen94]. Although only the one in [Backofen94] solves the satisfiability problem completely, both methods have their merits. But it may happen that in one single description, there are parts where the first method is more appropriate, and other parts where the second should be applied. In this paper, we present a common framework that allows to combine both methods. This is done be presented a set of rules for simplifying feature descriptions. The different methods are described as different controls on this rule set, where a control specifies in which order the different rules must be applied.

Available:
ps.gz (64 KB)   BibTeX Entry ( Backofen-Ecai96 )

A Complete Axiomatization of a Theory with Feature and Arity Constraints

Rolf Backofen

In: Journal of Logic Programming, 1995, 24, 37-72

CFT is a recent constraint system providing records as a logical data structure for logic programming and for natural language processing. It combines the rational tree system as defined for logic programming with the feature tree system as used in natural language processing. The formulae considered in this paper are all first-order-logic formulae over a signature of binary and unary predicates called features and arities, respectively. We establish the theory CFT by means of seven axiom schemes and show its completeness. Our completeness proof exhibits a terminating simplification system deciding validity and satisfiability of possibly quantified record descriptions.

Available:
ps.gz (142 KB)   BibTeX Entry ( Backofen:95 )

A First-Order Axiomatization of the Theory of Finite Trees

Rolf Backofen, James Rogers, K. Vijay-Shanker

In: Journal of Logic, Language and Information, 1995, 4(1), 5-39

We provide first-order axioms for the theories of finite trees with bounded branching and finite trees with arbitrary (finite) branching. The signature is chosen to express, in a natural way, those properties of trees most relevant to linguistic theories. These axioms provide a foundation for results in linguistics that are based on reasoning formally about such properties. We include some observations on the expressive power of these theories relative to traditional language complexity classes.

Available:
ps.gz (99 KB)   BibTeX Entry ( Bac:Rog:Vij:95 )

A Complete and Recursive Feature Theory

Rolf Backofen, Gert Smolka

In: Theoretical Computer Science, July 1995, 146(1--2), 243--268

Various feature descriptions are being employed in logic programming languages and constrained-based grammar formalisms. The common notational primitive of these descriptions are functional attributes called features. The descriptions considered in this paper are the possibly quantified first-order formulae obtained from a signature of binary and unary predicates called features and sorts, respectively. We establish a first-order theory FT by means of three axiom schemes, show its completeness, and construct three elementarily equivalent models. One of the models consists of so-called feature graphs, a data structure common in computational linguistics. The other two models consist of so-called feature trees, a record-like data structure generalizing the trees corresponding to first-order terms. Our completeness proof exhibits a terminating simplification system deciding validity and satisfiability of possibly quantified feature descriptions.

Available:
ps.gz (109 KB)   BibTeX Entry ( Bac:Smo:95 )

Regular Path Expressions in Feature Logic

Rolf Backofen

In: Journal of Symbolic Computation, 1994, 17, 412--455

We examine the existential fragment of a feature logic, which is extended by regular path expressions. A regular path expression is a subterm relation, where the set of allowed paths for the subterms is restricted to any given regular language. In the area of computational linguistics, this notion has been introduced as ``functional uncertainty''. We will prove that satisfiability is decidable by constructing a quasi-terminating rule system.

Available:
ps.gz (107 KB)   BibTeX Entry ( BackofenJSC94 )

How to Win a Game with Features

Rolf Backofen, Ralf Treinen

In: Jean-Pierre Jouannaud, 1st International Conference on Constraints in Computational Logics, Lecture Notes in Computer Science, September 1994, 845, 320--335

features given by Smolka & Treinen 1992 is complete. In contrast to other completeness proofs that have been given in this field, we employ the method of Ehrenfeucht-Fraisse Games. The result extends previous results on complete axiomatizations of rational trees in the language of constructor equations (Comon & Lescanne 1988, Maher 1988) or in a weaker feature language as given by Backofen & Smolka 1993.

Available:
ps.gz (59 KB)   BibTeX Entry ( BackofenTreinenCCL94 )

DISCO -- An HPSG-Based NLP System and its Application for Appointment Scheduling

Hans Uszkoreit, Rolf Backofen, Stephan Busemann, Abdel Kader Diagne, Elizabeth A. Hinkleman, Walter Kasper, Bernd Kiefer, Hans-Ulrich Krieger, Klaus Netter, G\"unter Neumann, Stephan Oepen, Stephen P. Spackman

In: Proceedings of the 15th International Conference on Computational Linguistics (COLING'94), 1994, 436--440

The natural language system DISCO is described. It combines a powerful and flexible grammar development system; linguistic competence for German including morphology, syntax and semantics; new methods for linguistic performance modeling on the basis of high-level competence grammars; new methods for modelling multi-agent dialogue competence; an interesting sample application for appointment scheduling and calendar management.

Available:
ps.gz (96 KB)   doi:http://dx.doi.org/10.3115/991886.991963   BibTeX Entry ( Usz:Bac:Bus:94 )

The TDL/UDiNe system

Rolf Backofen, Hans-Ulrich Krieger

In: Rolf Backofen, Hans-Ulrich Krieger, Stephen P. Spackman, Hans Uszkoreit, Report of the EAGLES Workshop on Implemented Formalisms at DFKI, Saarbrücken, 1993, 67--74

No Abstract available

Publication note:
DFKI Document D-93-27

Available:
BibTeX Entry ( Bac:Kri:93 )

Regular Path Expressions in Feature Logic

Rolf Backofen

In: Claude Kirchner, Proc.\ of the RTA'93, Lecture Notes in Computer Science, 1993, 690, 121--135

We examine the existential fragment of a feature logic, which is extended by regular path expressions. A regular path expression is a subterm relation, where the allowed paths for the subterms are restricted by a regular language. We will prove that satisfiability is decidable. This is achieved by setting up a quasi-terminating rewrite system.

Available:
BibTeX Entry ( Bac:Regular:RTA93 )

On the Decidability of Functional Uncertainty

Rolf Backofen

In: Proc. of the 31\it st ACL, 1993, 201--208

We show that feature logic extended by functional uncertainty is decidable, even if one admits cyclic descriptions. We present an algorithm, which solves feature descriptions containing functional uncertainty in two phases, both phases using a set of deterministic and non-deterministic rewrite rules. We then compare our algorithm with the one of Kaplan and Maxwell, that does not cover cyclic feature descriptions.

Available:
ps.gz (91 KB)   BibTeX Entry ( Bac:Uncertainty:ACL93-withoutfull )

A Complete and Recursive Feature Theory

Rolf Backofen, Gert Smolka

In: Proc. of the 31\it st ACL, 1993, 193--200

Various feature descriptions are being employed in constrained-based grammar formalisms. The common notational primitive of these descriptions are functional attributes called features. The descriptions considered in this paper are the possibly quantified first-order formulae obtained from a signature of features and sorts. We establish a complete first-order theory FT by means of three axiom schemes and construct three elementarily equivalent models.

One of the models consists of so-called feature graphs, a data structure common in computational linguistics. The other two models consist of so-called feature trees, a record-like data structure generalizing the trees corresponding to first-order terms.

Our completeness proof exhibits a terminating simplification system deciding validity and satisfiability of possibly quantified feature descriptions.

Available:
ps.gz (83 KB)   BibTeX Entry ( BS:Compl:ACL93-withoutfull )

Distributed Disjunktions For LIFE

Rolf Backofen, Lutz Euler, Günther Görz

In: H. Boley, M. M. Richter, Proc. of the Inter. Workshop on Processing Declarative Knowledge (PDK'91), Lecture Notes in Computer Science, 1991, 567, 161-170

pclife, a dialect of the LIFE language designed by \ait , extends the original design by some features, the most important of which are distributed disjunctions. LIFE integrates the functional and the logic-oriented programming styles, and feature types supporting inheritance. This language is well suited for knowledge representation, in particular for applications in computational linguistics.

Keywords: Knowledge representation, AI software, inferences, natural language processing

Available:
BibTeX Entry ( Backo:Distr )

Linking Typed Feature Formalisms and Terminological Knowledge Representation Languages in Natural Language Front-Ends

Rolf Backofen, Harald Trost, Hans Uszkoreit

In: Wilfried Brauer, Daniel Hernández, Proceedings of the GI Congress, Knowledge-Based Systems 1991, Informatik-Fachberichte, 1991, 291, 375-384

In this paper we describe an interface between typed feature formalisms and terminological languages like KL-ONE. The definition of such an interface is motivated by the needs of natural language front-ends to AI-systems where information must be transmitted from the front- end to the back-end system and vice versa.

We show how some minor extensions to the feature formalism allow for a syntactic description of individual concepts in terms of typed feature structures. Namely, we propose to include intervals and a special kind of sets. Partial consistency checks can be made on these concept descriptions during the unification of feature terms. Type checki ng on these special types involves calling the classifier of the terminological language. The final consistency check is performed only w hen transferring these concept descriptions into structures of the A-Box of the terminological language.

Available:
ps.gz (74 KB)   BibTeX Entry ( Bac:Tro:Usz:91a )

Towards the Integration of Functions, Relations and Types in an AI Programming Language

Rolf Backofen, Lutz Euler, Günther Görz

In: Heinz Marburger, Proc. of the 14th German Workshop on Artificial Intelligence, Informatik Fachberichte, 1990, 251, 297--306

This paper describes the design and implementation of the programming language PC-Life. This language integrates the functional and the logic-oriented programming style and feature types supporting inheritance. This combination yields a language particularly suited to knowledge representation, especially for application in computational linguistics.

Keywords: Knowledge representation, AI software, inferences, natural language processing

Available:
BibTeX Entry ( Backofen:90 )