Professur für
Bioinformatik
Institute für Informatik
Universität Freiburg
en

ExpaRNA-P - Simultaneous Exact Matching and Folding

Synopsis

ExpaRNA-P enumerates exactly matching local sequence-structure patterns in RNAs of RNAs with unknown structure, supporting full structural flexibility according to RNA secondary structure energy models (inheriting from the Vienna RNA package.)

Furthermore, it performs very fast simultaneous alignment and folding of RNAs (think: "like LocARNA, but faster"), internally based on exact matching.

Going far beyond previous matching approaches -- including the older tool ExpaRNA, ExpaRNA-P considers the entire ensemble of potential RNA secondary structures. In consequence, ExpaRNA-P simultaneously matches and folds the input RNA sequences, which enables enumerating thermodynamically relevant local sequence-structure motifs. In particular, the approach avoids to commit to -- generally very unreliable -- predicted single RNA structures.

Download

ExpaRNA-P is distributed as part of the LocARNA package. Please find the download link to the latest release there. The software is freely available under GPL 3.0.

Installation

The software is tested on recent GNU/Linux systems; furthermore, it is reported to work under MacOSX or Windows Cygwin. Please follow the installation instructions of LocARNA.

Usage

The basic tool exparna_p computes Exact Pattern Matchings (EPMs) in the Boltzmann-distributed structure ensemble of two RNA sequences. The wrapper script exploc_p performs "simultaneous matching and folding" (SM&F) of RNA sequences: the EPMs computed by exparna_p are used as anchor constraints to speed up the alignment computation with locarna.

EPM computation using exparna_p

exparna_p expects two input files in fasta format that contain the first and second sequence, respectively.

fileA: 
> seqA 
... [Your first RNA sequence] ... 

fileB: 
> seqB 
... [Your second RNA sequence] ...

Call exparna_p by:

exparna_p fileA fileB

In this basic version, exparna_p computes all EPMs with a given minimal score with the heuristic traceback. There are several options to control the output:

  • --output-ps filename: outputs best EPM chain as colored postscript
  • --output-epm-list filename: outputs a list of all traced EPMs
  • --output-chained-epm-list filename: outputs a list of all chained EPMs

Help on the options of exparna_p is available by exparna_p --help or exparna_p --man.

Simultaneous Matching and Folding using exploc_p

This script uses the best chain of EPMs from exparna_p and uses them as anchor constraints for locarna. exploc_p expects two input files in fasta format that contain the first and second sequence, respectively.

fileA: 
> seqA 
... [Your first RNA sequence] ... 

fileB: 
> seqB 
... [Your second RNA sequence] ...

Call exploc_p by:

exploc_p fileA fileB

The program writes the anchor constraints in the current directory. The output directory can be controlled by the option --output. Help on the options of exploc_p is available by exploc_p --help or exploc_p --man.

Publication

Christina Otto, Mathias Mohl, Steffen Heyne, Mika Amit, Gad M. Landau, Rolf Backofen, and Sebastian Will.
ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs. BMC Bioinformatics, 15 no. 1 pp. 6602, 2014.
[pdf] [doi] [pubmed] [bib] [doi]