Format for multiple aligned or single sequences together with the probabilistic description of the (consensus) RNA secondary structure ensemble by probabilities of base pairs, base pair stackings, and base pairs and unpaired bases in the loop of base pairs.
The LocARNA PP format combines sequence or alignment information and (respectively, single or consensus) ensemble probabilities into an PP 2.0 record. LocARNA utilizes this format for input (single to-be-aligned sequences/alignments) and output (alignments.) Records are composed of one or several sections.
This format is used by tools of the LocARNA package for input and output of sequences and alignments together with their probabilistic ensemble descriptions. Note that, for legacy, LocARNA as well supports a deprecated version 1.0 of the PP format.
#PP 2.0 hdrA GGCACCACUC-GAAGGC--UAAGCCAAAGUGGUGCU vhuD GUUCUCUCGG-GAACCCGUCAAGGGACCGAGAGAAC vhuU AGCUCACAACCGAACCCAUUUGGGAGGUUGUGAGCU fwdB AUGUUGGAGGGGAACCCGUAAGGGACCCUCCAAGAU #A1 ......AA..............BBB........... #A2 ......12..............123........... #END #SECTION BASEPAIRS #BPCUT 0.2 4 33 1.0 1 36 0.6 9 28 0.98 8 29 1.0 5 32 1.0 7 30 0.9 6 31 1 2 35 0.9 #END #SECTION INLOOP #BPILCUT 0.5 #UILCUT 0.5 2 35: 5 32 0.89 ; 3 0.9 4 0.1 8 29: 9 28 0.98 7 30 0.9; 10 0.7 27 0.6 #END
Each record starts with the header
#PP 2.0
followed by the "alignment section". This section specifies the sequence names, alignment strings, and -optionally- anchor constraints.
Thus, it contains lines describing alignment rows
sequence name alignment string
or alignment annotations (usually, to specify anchor constraints):
#An constraint-string
The latter lines (for n=1..) each specify the n-th characters of alignment column names, such that multi-character names can be specified by several lines with consecutive indices n; characters '.' and ' ' are identified. Line breaks are supported by concatenating strings with repeated names. Otherwise, the order of lines is arbitrary.
Each section is terminated by the line
#END
All following sections are introduced by a section header
#SECTION section_name
Base pairs probabilities are specified in the section with header
#SECTION BASEPAIRS
The keyword #BPCUT allows specifying the cutoff of contained probabilities. Base pair probabilities are listed each in a single line
i j p_ij
In loop probabilities can be specified in a section with header
#SECTION INLOOP
Here, the additional base pair in loop and unpaired in loop probability thresholds are respectively specified with
#BPILCUT 0.0005 #UILCUT 0.0005
The probabilities in the loop of a base pair i,j are specified by lines
i j: { k l Pr[(k,l) in loop of (i,j) } ; { k Pr[k in loop of (i,j)] }