Computes partition function of alignment, arc match and base match probabilities. More...
#include <aligner_p.hh>
Public Types | |
typedef size_t | size_type |
size | |
typedef std::pair< size_type, size_type > | size_pair |
pair of size_type | |
typedef BasePairs__Arc | Arc |
arc | |
Public Member Functions | |
AlignerP (const AlignerParams &ap) | |
Construct from parameters. | |
AlignerP (const AlignerP &p) | |
copy constructor p AlignerP object to be copied | |
~AlignerP () | |
Destructor. | |
pf_score_t | align_inside () |
void | align_outside () |
void | compute_basematch_probabilities (bool basematch_probs_include_arcmatch) |
void | compute_arcmatch_probabilities () |
void | write_arcmatch_probabilities (std::ostream &out) |
write the arc match probabilities to a stream | |
void | write_basematch_probabilities (std::ostream &out) |
write the base match probabilities to a stream | |
pf_score_t | virtual_Mprime (size_type al, size_type bl, size_type i, size_type j, size_type max_ar, size_type max_br) const |
Access virtual Mprime matrix. | |
double | compute_fragment_match_prob (size_type i, size_type j, size_type k, size_type l) |
Fragment match probability. | |
void | freeD () |
free the space of D, take care! | |
void | freeMprime () |
free the space of D, take care! | |
Static Public Member Functions | |
static AlignerPParams | create () |
create with named parameters | |
Protected Member Functions | |
void | init_M (size_type al, size_type ar, size_type bl, size_type br) |
initialize first column and row of M, for inside recursion | |
void | init_E (size_type al, size_type ar, size_type bl, size_type br) |
initialize E | |
void | init_Mrev (size_type al, size_type ar, size_type bl, size_type br) |
void | init_Erev (size_type al, size_type ar, size_type bl, size_type br) |
pf_score_t | comp_E_entry (size_type al, size_type bl, size_type i, size_type j) |
initialize first column and row of M' for outside recursion | |
pf_score_t | comp_F_entry (size_type al, size_type bl, size_type i, size_type j) |
compute one entry in F (inside recursion cases) | |
pf_score_t | comp_M_entry (size_type al, size_type bl, size_type i, size_type j) |
compute one entry in M (inside recursion cases) | |
pf_score_t | comp_Mprime_entry (size_type al, size_type bl, size_type i, size_type j, size_type max_ar, size_type max_br) |
compute one entry in Mprime (outside recursion cases) | |
pf_score_t | comp_Eprime_entry (size_type al, size_type bl, size_type i, size_type j) |
compute one entry in Eprime (outside recursion cases) | |
pf_score_t | comp_Fprime_entry (size_type al, size_type bl, size_type i, size_type j) |
compute one entry in Fprime (outside recursion cases) | |
pf_score_t | comp_Erev_entry (size_type i, size_type j) |
compute one entry in Erev | |
pf_score_t | comp_Frev_entry (size_type i, size_type j) |
compute one entry in Frev | |
pf_score_t | comp_Mrev_entry (size_type i, size_type j, size_type ar, size_type br) |
void | align_inside_arcmatch (size_type al, size_type ar, size_type bl, size_type br) |
void | align_outside_arcmatch (size_type al, size_type ar, size_type max_ar, size_type bl, size_type br, size_type max_br) |
void | align_reverse (size_type al, size_type ar, size_type bl, size_type br, bool copy=false) |
void | align_D () |
void | align_Dprime () |
void | fill_D (size_type al, size_type bl, size_type max_ar, size_type max_br) |
void | fill_Dprime (size_type al, size_type bl, size_type min_ar, size_type min_br, size_type max_ar, size_type max_br) |
pf_score_t & | D (const ArcMatch &am) |
returns lvalue of matrix D | |
pf_score_t & | D (const Arc &arcA, const Arc &arcB) |
returns lvalue of matrix D | |
pf_score_t & | Dprime (const ArcMatch &am) |
returns lvalue of matrix D' | |
pf_score_t & | Dprime (const Arc &arcA, const Arc &arcB) |
returns lvalue of matrix D' | |
size_type | leftmost_covering_arc (size_type s, const BasePairs &bps, size_type l, size_type r) const |
std::pair< size_type, size_type > | leftmost_covering_arcmatch (size_type al, size_type bl, size_type ar, size_type br) const |
size_type | rightmost_covering_arc (const BasePairs &bps, size_type l, size_type r, size_type s) const |
std::pair< size_type, size_type > | rightmost_covering_arcmatch (size_type al, size_type bl, size_type ar, size_type br) const |
void | alloc_inside_matrices () |
allocate space for the inside matrices | |
void | alloc_outside_matrices () |
allocate space for the outside matrices | |
Protected Attributes | |
const AlignerPParams * | params |
the parameter for the alignment | |
const Scoring * | scoring |
the scores | |
const Sequence & | seqA |
sequence A | |
const BasePairs & | bpsA |
base pairs A | |
const Sequence & | seqB |
sequence B | |
const BasePairs & | bpsB |
base pairs B | |
const ArcMatches & | arc_matches |
(potential) arc matches of A and B | |
AlignerPRestriction | r |
restriction of alignment | |
pf_score_t | pf_scale |
pf_score_t | partFunc |
PFScoreMatrix | Dmat |
PFScoreVector | E |
pf_score_t | F |
PFScoreMatrix | M |
PFScoreMatrix | Mrev |
PFScoreVector | Erev |
pf_score_t | Frev |
PFScoreMatrix | Erev_mat |
PFScoreMatrix | Frev_mat |
PFScoreMatrix | Dmatprime |
PFScoreVector | Eprime |
pf_score_t | Fprime |
PFScoreMatrix | Mprime |
SparseProbMatrix | am_prob |
probabilities of arc matchs, as computed by the algo | |
SparsePFScoreMatrix | bm_prob |
bool | D_created |
flag, is D already created? | |
bool | Dprime_created |
flag, is Dprime already created? |
Computes partition function of alignment, arc match and base match probabilities.
Performs partition function computation over alignment consensus structure pairs of two sequences and their associated sets of weighted basepairs
Implements indise and outside algorithm.
Implements calculation of match probabilities.
The class knows about the two sequences and the two weighted base pair sets.
LocARNA::AlignerP::AlignerP | ( | const AlignerParams & | ap | ) |
Construct from parameters.
ap | parameter for aligner |
void LocARNA::AlignerP::align_D | ( | ) | [protected] |
create the entries in the D matrix. This function is called by align() (unless D_created)
void LocARNA::AlignerP::align_Dprime | ( | ) | [protected] |
create the entries in the Dprime matrix This function is called by align() (unless Dprime_created) uses inside recursion
pf_score_t LocARNA::AlignerP::align_inside | ( | ) |
compute the partition function by the inside algorithm and fill the D matrix
void LocARNA::AlignerP::align_inside_arcmatch | ( | size_type | al, |
size_type | ar, | ||
size_type | bl, | ||
size_type | br | ||
) | [protected] |
align subsequences enclosed by two arcs
al | left end of arc in seqA |
ar | right end of arc in seqA |
bl | left end of arc in seqB |
br | right end of arc in seqB Align subsequences seqA(al+1,ar-1) to seqB(bl+1,br-1). Computes matrix entries in M, E, F. post: entries (i,j) are valid in the range al<i<ar, bl<j<br |
void LocARNA::AlignerP::align_outside | ( | ) |
perform the outside algorithm and fill the Dprime matrix, assumes that D matrix is computed already
void LocARNA::AlignerP::align_outside_arcmatch | ( | size_type | al, |
size_type | ar, | ||
size_type | max_ar, | ||
size_type | bl, | ||
size_type | br, | ||
size_type | max_br | ||
) | [protected] |
align outside of an arc-match
al | left end of arc in seqA |
ar | right end of arc in seqA |
bl | left end of arc in seqB |
br | right end of arc in seqB |
max_ar | leftmost right end in seqA, for which the score can simply be composed from M and Mrev. |
max_br | leftmost right end in seqB, for which the score can simply be composed from M and Mrev. |
void LocARNA::AlignerP::align_reverse | ( | size_type | al, |
size_type | ar, | ||
size_type | bl, | ||
size_type | br, | ||
bool | copy = false |
||
) | [protected] |
align reversed. fills matrices Mrev, Erev, Frev, such that Mrev(i,j) codes for subsequences seqA(i+1..ar) and seqB(j+1..br) and is valid for al-1<=i<=ar and bl-1<=j<=br
al | left position delimiting range of positions in seqA |
ar | right position delimiting range of positions in seqA |
bl | left position delimiting range of positions in seqB |
br | right position delimiting range of positions in seqB pre: matrix Mrev has size 0..lenA x 0..lenB |
copy | if true, make a copy of Erev/Frev in Erev/Frev_mat |
Note that al, ar, bl, br denote the actual limits of the subsequences, which differs from their use in align_inside_arcmatch. Otherwise the two methods correspond to each other by respectively performing forward and backward computation!
pf_score_t LocARNA::AlignerP::comp_E_entry | ( | size_type | al, |
size_type | bl, | ||
size_type | i, | ||
size_type | j | ||
) | [inline, protected] |
initialize first column and row of M' for outside recursion
initialize first row of E' for outside recursion compute one entry in E (inside recursion cases)
pf_score_t LocARNA::AlignerP::comp_Mrev_entry | ( | size_type | i, |
size_type | j, | ||
size_type | ar, | ||
size_type | br | ||
) | [inline, protected] |
compute one entry in Mrev, where Mrev(i,j) codes for subsequences seqA(i+1..ar) and seqB(j+1..br)
ar | right position delimiting range of positions in seqA |
br | right position delimiting range of positions in seqB |
i | position in seqA |
j | position in seqB assert i<=ar and j<=br. pre: matrix entries Mrev(i',j') and Erev(j') computed/initialised for i<=i'<=ar, j<=j'<=br, (i,j)!=(i',j') |
computes the probabilitites of all arc matches and stores them internally (in a sparse matrix), no probability filtering
void LocARNA::AlignerP::compute_basematch_probabilities | ( | bool | basematch_probs_include_arcmatch | ) |
computes the probabilitites of all base matches and stores them internally (in a 2D-matrix), no probability filtering
double LocARNA::AlignerP::compute_fragment_match_prob | ( | size_type | i, |
size_type | j, | ||
size_type | k, | ||
size_type | l | ||
) |
Fragment match probability.
Computes the probability that two fragments [i..j] and [k..l] are matched in an alignment.
i | start of first fragment |
j | end of first fragment |
k | start of second fragment |
l | end of second fragment |
static AlignerPParams LocARNA::AlignerP::create | ( | ) | [inline, static] |
create with named parameters
void LocARNA::AlignerP::fill_D | ( | size_type | al, |
size_type | bl, | ||
size_type | max_ar, | ||
size_type | max_br | ||
) | [protected] |
fill in D the entries with left ends al,bl, where adjlA, adjlB are adjanceny lists of the arcs uses inside recursion
void LocARNA::AlignerP::fill_Dprime | ( | size_type | al, |
size_type | bl, | ||
size_type | min_ar, | ||
size_type | min_br, | ||
size_type | max_ar, | ||
size_type | max_br | ||
) | [protected] |
fill in D the entries with right ends ar,br, where adjrA, adjrB are adjanceny lists of the arcs uses outside recursion
void LocARNA::AlignerP::init_Erev | ( | size_type | al, |
size_type | ar, | ||
size_type | bl, | ||
size_type | br | ||
) | [protected] |
initialize the reversed E matrix/vector Erev[j] codes for subsequences seqA(i+1..ar) and seqB(j+1..br) and is valid for bl-1<=j<=br
al | left position delimiting range of positions in seqA |
ar | right position delimiting range of positions in seqA |
bl | left position delimiting range of positions in seqB |
br | right position delimiting range of positions in seqB pre: Erev has size 0..lenB |
void LocARNA::AlignerP::init_Mrev | ( | size_type | al, |
size_type | ar, | ||
size_type | bl, | ||
size_type | br | ||
) | [protected] |
initialize the reversed M matrix, such that Mrev(i,j) codes for subsequences seqA(i+1..ar) and seqB(j+1..br) and is valid for al-1<=i<=ar and bl-1<=j<=br
al | left position delimiting range of positions in seqA |
ar | right position delimiting range of positions in seqA |
bl | left position delimiting range of positions in seqB |
br | right position delimiting range of positions in seqB pre: matrix Mrev has size 0..lenA x 0..lenB |
AlignerP::size_type LocARNA::AlignerP::leftmost_covering_arc | ( | size_type | s, |
const BasePairs & | bps, | ||
size_type | l, | ||
size_type | r | ||
) | const [protected] |
determine leftmost end of an arc that covers the range l..r
s | sequence position, limit to base pairs right of s or equal |
bps | base pairs |
l | sequence position, left end of range |
r | sequence position, right end of range |
AlignerP::size_pair LocARNA::AlignerP::leftmost_covering_arcmatch | ( | size_type | al, |
size_type | bl, | ||
size_type | ar, | ||
size_type | br | ||
) | const [protected] |
compute the leftmost left ends of an arc match that covers (al,ar);(bl,br) (or smaller positions).
al | left end of base pair in seqA |
ar | right end of base pair in seqA |
bl | left end of base pair in seqB |
br | right end of base pair in seqB |
AlignerP::size_type LocARNA::AlignerP::rightmost_covering_arc | ( | const BasePairs & | bps, |
size_type | l, | ||
size_type | r, | ||
size_type | s | ||
) | const [protected] |
bps | base pairs |
l | sequence position, left end of range |
r | sequence position, right end of range |
s | sequence position, limit to base pairs left of s or equal |
AlignerP::size_pair LocARNA::AlignerP::rightmost_covering_arcmatch | ( | size_type | al, |
size_type | bl, | ||
size_type | ar, | ||
size_type | br | ||
) | const [protected] |
al | left end of base pair in seqA |
ar | right end of base pair in seqA |
bl | left end of base pair in seqB |
br | right end of base pair in seqB returns the rightmost left ends of an arc match that covers (al,ar);(bl,br) (or smaller positions). |
pf_score_t LocARNA::AlignerP::virtual_Mprime | ( | size_type | al, |
size_type | bl, | ||
size_type | i, | ||
size_type | j, | ||
size_type | max_ar, | ||
size_type | max_br | ||
) | const |
Access virtual Mprime matrix.
tests whether Mprime value can be composed from M and Mrev and thus is not materialized (i.e. only virtual)
al | |
bl | |
i | |
j | |
max_ar | |
max_br |
void LocARNA::AlignerP::write_arcmatch_probabilities | ( | std::ostream & | out | ) |
write the arc match probabilities to a stream
probabilities are filtered by threshold params->min_am_prob
out | output stream |
void LocARNA::AlignerP::write_basematch_probabilities | ( | std::ostream & | out | ) |
write the base match probabilities to a stream
probabilities are filtered by threshold params->min_bm_prob
out | output stream |
SparsePFScoreMatrix LocARNA::AlignerP::bm_prob [protected] |
probabilities of base matchs, as computed by the algo. Because we use bm_prob to accumulate conditional partition functions before dividing by the total partition function to obtain probabilities, use PFScoreMatrix. We assume that the type is more general than ProbMatrix
PFScoreMatrix LocARNA::AlignerP::Dmat [protected] |
D(a,b) is the partition function of the subsequences seqA(al..ar) and seqB(bl..br), where the arcs a and b match
PFScoreMatrix LocARNA::AlignerP::Dmatprime [protected] |
D'(a,b) is the partition function of the subsequences seqA(1..al-1,ar+1..lenA) and seqB(1..bl-1,br+1..lenB) times the contribution of the arc match (al,ar);(bl,br)
PFScoreVector LocARNA::AlignerP::E [protected] |
For the current pair of left arc ends (al,bl) and a current line i E(j) is the partition function of the subsequences seqA(al+1..i) and seqB(bl+1..j) covering only alignments that gap the last position of seqA
In the algorithm, this is constantly overwritten, i.e. for a current j all entries E(j') j'<j are for the current line i and all entries j'>j are for the line i-1
PFScoreVector LocARNA::AlignerP::Eprime [protected] |
For the current pair of left arc ends (al,bl) and line i, E'(j) is the partition function of the subsequences seqA(1..al-1,i+1..lenA) and seqB(1..bl-1,j+1..lenB) where i+1 is aligned to a gap
PFScoreVector LocARNA::AlignerP::Erev [protected] |
reverse E "matrix"
PFScoreMatrix LocARNA::AlignerP::Erev_mat [protected] |
for outside optimization, store a complete copy of Erev and Frev
pf_score_t LocARNA::AlignerP::F [protected] |
For the current pair of left arc ends (al,bl) and current indices (i,j), F is the the partition function of the subsequences seqA(al+1..i) and seqB(bl+1..j) covering only alignments that gap the last position of seqB
In the algorithm, this is constantly overwritten, i.e. when we compute the entries for (i,j) it will still contain the value of (i,j-1) and is then updated to the value for (i,j)
pf_score_t LocARNA::AlignerP::Fprime [protected] |
For the current pair of left arc ends (al,bl) and (i,j), F' is the partition function of the subsequences seqA(1..al-1,i+1..lenA) and seqB(1..bl-1,j+1..lenB) where j+1 is aligned to a gap
pf_score_t LocARNA::AlignerP::Frev [protected] |
reverse F "matrix"
PFScoreMatrix LocARNA::AlignerP::Frev_mat [protected] |
complete copy of Frev
PFScoreMatrix LocARNA::AlignerP::M [protected] |
For the current pair of left arc ends (al,bl), M(i,j) is the partition function of the subsequences seqA(al+1..i) and seqB(bl+1..j)
PFScoreMatrix LocARNA::AlignerP::Mprime [protected] |
For the current pair of left arc ends (al,bl), M'(i,j) is the partition function of the subsequences seqA(1..al-1,i+1..lenA) and seqB(1..bl-1,j+1..lenB)
PFScoreMatrix LocARNA::AlignerP::Mrev [protected] |
For the current pair of left arc ends (al,bl), Mrev(i,j) is the partition function of the subsequences seqA(i+1..al-1) and seqB(j+1..bl-1)
pf_score_t LocARNA::AlignerP::partFunc [protected] |
the total partition function (only defined after call of align_inside())
pf_score_t LocARNA::AlignerP::pf_scale [protected] |
scales the partition function. We compute partition functions divided by pf_scale in order to avoid overflow of the double floating point range.
AlignerPRestriction LocARNA::AlignerP::r [protected] |