Computes partition function of alignment, arc match and base match probabilities. More...

#include <aligner_p.hh>

Public Types
typedef size_t	size_type
	size
typedef std::pair< size_type, size_type >	size_pair
	pair of size_type
typedef BasePairs__Arc	Arc
	arc
Public Member Functions
	AlignerP (const AlignerParams &ap)
	Construct from parameters.
	AlignerP (const AlignerP &p)
	copy constructor p AlignerP object to be copied
	~AlignerP ()
	Destructor.
pf_score_t	align_inside ()
void	align_outside ()
void	compute_basematch_probabilities (bool basematch_probs_include_arcmatch)
void	compute_arcmatch_probabilities ()
void	write_arcmatch_probabilities (std::ostream &out)
	write the arc match probabilities to a stream
void	write_basematch_probabilities (std::ostream &out)
	write the base match probabilities to a stream
pf_score_t	virtual_Mprime (size_type al, size_type bl, size_type i, size_type j, size_type max_ar, size_type max_br) const
	Access virtual Mprime matrix.
double	compute_fragment_match_prob (size_type i, size_type j, size_type k, size_type l)
	Fragment match probability.
void	freeD ()
	free the space of D, take care!
void	freeMprime ()
	free the space of D, take care!
Static Public Member Functions
static AlignerPParams	create ()
	create with named parameters
Protected Member Functions
void	init_M (size_type al, size_type ar, size_type bl, size_type br)
	initialize first column and row of M, for inside recursion
void	init_E (size_type al, size_type ar, size_type bl, size_type br)
	initialize E
void	init_Mrev (size_type al, size_type ar, size_type bl, size_type br)
void	init_Erev (size_type al, size_type ar, size_type bl, size_type br)
pf_score_t	comp_E_entry (size_type al, size_type bl, size_type i, size_type j)
	initialize first column and row of M' for outside recursion
pf_score_t	comp_F_entry (size_type al, size_type bl, size_type i, size_type j)
	compute one entry in F (inside recursion cases)
pf_score_t	comp_M_entry (size_type al, size_type bl, size_type i, size_type j)
	compute one entry in M (inside recursion cases)
pf_score_t	comp_Mprime_entry (size_type al, size_type bl, size_type i, size_type j, size_type max_ar, size_type max_br)
	compute one entry in Mprime (outside recursion cases)
pf_score_t	comp_Eprime_entry (size_type al, size_type bl, size_type i, size_type j)
	compute one entry in Eprime (outside recursion cases)
pf_score_t	comp_Fprime_entry (size_type al, size_type bl, size_type i, size_type j)
	compute one entry in Fprime (outside recursion cases)
pf_score_t	comp_Erev_entry (size_type i, size_type j)
	compute one entry in Erev
pf_score_t	comp_Frev_entry (size_type i, size_type j)
	compute one entry in Frev
pf_score_t	comp_Mrev_entry (size_type i, size_type j, size_type ar, size_type br)
void	align_inside_arcmatch (size_type al, size_type ar, size_type bl, size_type br)
void	align_outside_arcmatch (size_type al, size_type ar, size_type max_ar, size_type bl, size_type br, size_type max_br)
void	align_reverse (size_type al, size_type ar, size_type bl, size_type br, bool copy=false)
void	align_D ()
void	align_Dprime ()
void	fill_D (size_type al, size_type bl, size_type max_ar, size_type max_br)
void	fill_Dprime (size_type al, size_type bl, size_type min_ar, size_type min_br, size_type max_ar, size_type max_br)
pf_score_t &	D (const ArcMatch &am)
	returns lvalue of matrix D
pf_score_t &	D (const Arc &arcA, const Arc &arcB)
	returns lvalue of matrix D
pf_score_t &	Dprime (const ArcMatch &am)
	returns lvalue of matrix D'
pf_score_t &	Dprime (const Arc &arcA, const Arc &arcB)
	returns lvalue of matrix D'
size_type	leftmost_covering_arc (size_type s, const BasePairs &bps, size_type l, size_type r) const
std::pair< size_type, size_type >	leftmost_covering_arcmatch (size_type al, size_type bl, size_type ar, size_type br) const
size_type	rightmost_covering_arc (const BasePairs &bps, size_type l, size_type r, size_type s) const
std::pair< size_type, size_type >	rightmost_covering_arcmatch (size_type al, size_type bl, size_type ar, size_type br) const
void	alloc_inside_matrices ()
	allocate space for the inside matrices
void	alloc_outside_matrices ()
	allocate space for the outside matrices
Protected Attributes
const AlignerPParams *	params
	the parameter for the alignment
const Scoring *	scoring
	the scores
const Sequence &	seqA
	sequence A
const BasePairs &	bpsA
	base pairs A
const Sequence &	seqB
	sequence B
const BasePairs &	bpsB
	base pairs B
const ArcMatches &	arc_matches
	(potential) arc matches of A and B
AlignerPRestriction	r
	restriction of alignment
pf_score_t	pf_scale
pf_score_t	partFunc
PFScoreMatrix	Dmat
PFScoreVector	E
pf_score_t	F
PFScoreMatrix	M
PFScoreMatrix	Mrev
PFScoreVector	Erev
pf_score_t	Frev
PFScoreMatrix	Erev_mat
PFScoreMatrix	Frev_mat
PFScoreMatrix	Dmatprime
PFScoreVector	Eprime
pf_score_t	Fprime
PFScoreMatrix	Mprime
SparseProbMatrix	am_prob
	probabilities of arc matchs, as computed by the algo
SparsePFScoreMatrix	bm_prob
bool	D_created
	flag, is D already created?
bool	Dprime_created
	flag, is Dprime already created?

Detailed Description

Computes partition function of alignment, arc match and base match probabilities.

Performs partition function computation over alignment consensus structure pairs of two sequences and their associated sets of weighted basepairs

Implements indise and outside algorithm.

Implements calculation of match probabilities.

The class knows about the two sequences and the two weighted base pair sets.

Constructor & Destructor Documentation

LocARNA::AlignerP::AlignerP ( const AlignerParams & ap )

Construct from parameters.

Parameters:

ap	parameter for aligner

Note:: ap is copied to allow reference to a temporary; allow implicit conversion for named parameter idiom

Member Function Documentation

void LocARNA::AlignerP::align_D ( ) [protected]

create the entries in the D matrix. This function is called by align() (unless D_created)

void LocARNA::AlignerP::align_Dprime ( ) [protected]

create the entries in the Dprime matrix This function is called by align() (unless Dprime_created) uses inside recursion

pf_score_t LocARNA::AlignerP::align_inside ( )

compute the partition function by the inside algorithm and fill the D matrix

Returns:: partition function

void LocARNA::AlignerP::align_inside_arcmatch	(	size_type	al,
		size_type	ar,
		size_type	bl,
		size_type	br
	)		`[protected]`

align subsequences enclosed by two arcs

Parameters:

al	left end of arc in seqA
ar	right end of arc in seqA
bl	left end of arc in seqB
br	right end of arc in seqB Align subsequences seqA(al+1,ar-1) to seqB(bl+1,br-1). Computes matrix entries in M, E, F. post: entries (i,j) are valid in the range al<i<ar, bl<j<br

void LocARNA::AlignerP::align_outside ( )

perform the outside algorithm and fill the Dprime matrix, assumes that D matrix is computed already

void LocARNA::AlignerP::align_outside_arcmatch	(	size_type	al,
		size_type	ar,
		size_type	max_ar,
		size_type	bl,
		size_type	br,
		size_type	max_br
	)		`[protected]`

align outside of an arc-match

Parameters:

al	left end of arc in seqA
ar	right end of arc in seqA
bl	left end of arc in seqB
br	right end of arc in seqB
max_ar	leftmost right end in seqA, for which the score can simply be composed from M and Mrev.
max_br	leftmost right end in seqB, for which the score can simply be composed from M and Mrev.

void LocARNA::AlignerP::align_reverse	(	size_type	al,
		size_type	ar,
		size_type	bl,
		size_type	br,
		bool	copy = `false`
	)		`[protected]`

align reversed. fills matrices Mrev, Erev, Frev, such that Mrev(i,j) codes for subsequences seqA(i+1..ar) and seqB(j+1..br) and is valid for al-1<=i<=ar and bl-1<=j<=br

Parameters:

al	left position delimiting range of positions in seqA
ar	right position delimiting range of positions in seqA
bl	left position delimiting range of positions in seqB
br	right position delimiting range of positions in seqB pre: matrix Mrev has size 0..lenA x 0..lenB
copy	if true, make a copy of Erev/Frev in Erev/Frev_mat

Note that al, ar, bl, br denote the actual limits of the subsequences, which differs from their use in align_inside_arcmatch. Otherwise the two methods correspond to each other by respectively performing forward and backward computation!

pf_score_t LocARNA::AlignerP::comp_E_entry	(	size_type	al,
		size_type	bl,
		size_type	i,
		size_type	j
	)		`[inline, protected]`

initialize first column and row of M' for outside recursion

initialize first row of E' for outside recursion compute one entry in E (inside recursion cases)

pf_score_t LocARNA::AlignerP::comp_Mrev_entry	(	size_type	i,
		size_type	j,
		size_type	ar,
		size_type	br
	)		`[inline, protected]`

compute one entry in Mrev, where Mrev(i,j) codes for subsequences seqA(i+1..ar) and seqB(j+1..br)

Parameters:

ar	right position delimiting range of positions in seqA
br	right position delimiting range of positions in seqB
i	position in seqA
j	position in seqB assert i<=ar and j<=br. pre: matrix entries Mrev(i',j') and Erev(j') computed/initialised for i<=i'<=ar, j<=j'<=br, (i,j)!=(i',j')

Returns:: score of entry M(i,j)

void LocARNA::AlignerP::compute_arcmatch_probabilities ( )

computes the probabilitites of all arc matches and stores them internally (in a sparse matrix), no probability filtering

Todo:: check: why is that long double? do we need it? should we rather use pf_t?

void LocARNA::AlignerP::compute_basematch_probabilities ( bool basematch_probs_include_arcmatch )

computes the probabilitites of all base matches and stores them internally (in a 2D-matrix), no probability filtering

double LocARNA::AlignerP::compute_fragment_match_prob	(	size_type	i,
		size_type	j,
		size_type	k,
		size_type	l
	)

Fragment match probability.

Computes the probability that two fragments [i..j] and [k..l] are matched in an alignment.

Parameters:

i	start of first fragment
j	end of first fragment
k	start of second fragment
l	end of second fragment

Returns:: fragment match probability

static AlignerPParams LocARNA::AlignerP::create ( ) [inline, static]

create with named parameters

Returns:: parameter object

void LocARNA::AlignerP::fill_D	(	size_type	al,
		size_type	bl,
		size_type	max_ar,
		size_type	max_br
	)		`[protected]`

fill in D the entries with left ends al,bl, where adjlA, adjlB are adjanceny lists of the arcs uses inside recursion

void LocARNA::AlignerP::fill_Dprime	(	size_type	al,
		size_type	bl,
		size_type	min_ar,
		size_type	min_br,
		size_type	max_ar,
		size_type	max_br
	)		`[protected]`

fill in D the entries with right ends ar,br, where adjrA, adjrB are adjanceny lists of the arcs uses outside recursion

void LocARNA::AlignerP::init_Erev	(	size_type	al,
		size_type	ar,
		size_type	bl,
		size_type	br
	)		`[protected]`

initialize the reversed E matrix/vector Erev[j] codes for subsequences seqA(i+1..ar) and seqB(j+1..br) and is valid for bl-1<=j<=br

Parameters:

al	left position delimiting range of positions in seqA
ar	right position delimiting range of positions in seqA
bl	left position delimiting range of positions in seqB
br	right position delimiting range of positions in seqB pre: Erev has size 0..lenB

void LocARNA::AlignerP::init_Mrev	(	size_type	al,
		size_type	ar,
		size_type	bl,
		size_type	br
	)		`[protected]`

initialize the reversed M matrix, such that Mrev(i,j) codes for subsequences seqA(i+1..ar) and seqB(j+1..br) and is valid for al-1<=i<=ar and bl-1<=j<=br

Parameters:

al	left position delimiting range of positions in seqA
ar	right position delimiting range of positions in seqA
bl	left position delimiting range of positions in seqB
br	right position delimiting range of positions in seqB pre: matrix Mrev has size 0..lenA x 0..lenB

AlignerP::size_type LocARNA::AlignerP::leftmost_covering_arc	(	size_type	s,
		const BasePairs &	bps,
		size_type	l,
		size_type	r
	)		const `[protected]`

determine leftmost end of an arc that covers the range l..r

Parameters:

s	sequence position, limit to base pairs right of s or equal
bps	base pairs
l	sequence position, left end of range
r	sequence position, right end of range

Returns:: leftmost end l'>=s of any arc in bps that covers (l,r). Return l if there is no such arc An arc (l',r') covers (l,r) iff l'<l and r'>r.

AlignerP::size_pair LocARNA::AlignerP::leftmost_covering_arcmatch	(	size_type	al,
		size_type	bl,
		size_type	ar,
		size_type	br
	)		const `[protected]`

compute the leftmost left ends of an arc match that covers (al,ar);(bl,br) (or smaller positions).

Parameters:

al	left end of base pair in seqA
ar	right end of base pair in seqA
bl	left end of base pair in seqB
br	right end of base pair in seqB

AlignerP::size_type LocARNA::AlignerP::rightmost_covering_arc	(	const BasePairs &	bps,
		size_type	l,
		size_type	r,
		size_type	s
	)		const `[protected]`

Parameters:

bps	base pairs
l	sequence position, left end of range
r	sequence position, right end of range
s	sequence position, limit to base pairs left of s or equal

Returns:: rightmost end r'<=s of an arc in bps that covers (l,r). Return r if there is no such arc An arc (l',r') covers (l,r) iff l'<l and r'>r.

AlignerP::size_pair LocARNA::AlignerP::rightmost_covering_arcmatch	(	size_type	al,
		size_type	bl,
		size_type	ar,
		size_type	br
	)		const `[protected]`

Parameters:

al	left end of base pair in seqA
ar	right end of base pair in seqA
bl	left end of base pair in seqB
br	right end of base pair in seqB returns the rightmost left ends of an arc match that covers (al,ar);(bl,br) (or smaller positions).

pf_score_t LocARNA::AlignerP::virtual_Mprime	(	size_type	al,
		size_type	bl,
		size_type	i,
		size_type	j,
		size_type	max_ar,
		size_type	max_br
	)		const

Access virtual Mprime matrix.

tests whether Mprime value can be composed from M and Mrev and thus is not materialized (i.e. only virtual)

Parameters:

al
bl
i
j
max_ar
max_br

Returns:: entry of virtual Mprime matrix

void LocARNA::AlignerP::write_arcmatch_probabilities ( std::ostream & out )

write the arc match probabilities to a stream

probabilities are filtered by threshold params->min_am_prob

Parameters:

out	output stream

void LocARNA::AlignerP::write_basematch_probabilities ( std::ostream & out )

write the base match probabilities to a stream

probabilities are filtered by threshold params->min_bm_prob

Parameters:

out	output stream

Member Data Documentation

SparsePFScoreMatrix LocARNA::AlignerP::bm_prob [protected]

probabilities of base matchs, as computed by the algo. Because we use bm_prob to accumulate conditional partition functions before dividing by the total partition function to obtain probabilities, use PFScoreMatrix. We assume that the type is more general than ProbMatrix

PFScoreMatrix LocARNA::AlignerP::Dmat [protected]

D(a,b) is the partition function of the subsequences seqA(al..ar) and seqB(bl..br), where the arcs a and b match

PFScoreMatrix LocARNA::AlignerP::Dmatprime [protected]

D'(a,b) is the partition function of the subsequences seqA(1..al-1,ar+1..lenA) and seqB(1..bl-1,br+1..lenB) times the contribution of the arc match (al,ar);(bl,br)

PFScoreVector LocARNA::AlignerP::E [protected]

For the current pair of left arc ends (al,bl) and a current line i E(j) is the partition function of the subsequences seqA(al+1..i) and seqB(bl+1..j) covering only alignments that gap the last position of seqA

In the algorithm, this is constantly overwritten, i.e. for a current j all entries E(j') j'<j are for the current line i and all entries j'>j are for the line i-1

PFScoreVector LocARNA::AlignerP::Eprime [protected]

For the current pair of left arc ends (al,bl) and line i, E'(j) is the partition function of the subsequences seqA(1..al-1,i+1..lenA) and seqB(1..bl-1,j+1..lenB) where i+1 is aligned to a gap

PFScoreVector LocARNA::AlignerP::Erev [protected]

reverse E "matrix"

See also:: Mrev

PFScoreMatrix LocARNA::AlignerP::Erev_mat [protected]

for outside optimization, store a complete copy of Erev and Frev

pf_score_t LocARNA::AlignerP::F [protected]

For the current pair of left arc ends (al,bl) and current indices (i,j), F is the the partition function of the subsequences seqA(al+1..i) and seqB(bl+1..j) covering only alignments that gap the last position of seqB

In the algorithm, this is constantly overwritten, i.e. when we compute the entries for (i,j) it will still contain the value of (i,j-1) and is then updated to the value for (i,j)

pf_score_t LocARNA::AlignerP::Fprime [protected]

For the current pair of left arc ends (al,bl) and (i,j), F' is the partition function of the subsequences seqA(1..al-1,i+1..lenA) and seqB(1..bl-1,j+1..lenB) where j+1 is aligned to a gap

pf_score_t LocARNA::AlignerP::Frev [protected]

reverse F "matrix"

See also:: Mrev

PFScoreMatrix LocARNA::AlignerP::Frev_mat [protected]

complete copy of Frev

See also:: Erev_mat

PFScoreMatrix LocARNA::AlignerP::M [protected]

For the current pair of left arc ends (al,bl), M(i,j) is the partition function of the subsequences seqA(al+1..i) and seqB(bl+1..j)

PFScoreMatrix LocARNA::AlignerP::Mprime [protected]

For the current pair of left arc ends (al,bl), M'(i,j) is the partition function of the subsequences seqA(1..al-1,i+1..lenA) and seqB(1..bl-1,j+1..lenB)

PFScoreMatrix LocARNA::AlignerP::Mrev [protected]

For the current pair of left arc ends (al,bl), Mrev(i,j) is the partition function of the subsequences seqA(i+1..al-1) and seqB(j+1..bl-1)

pf_score_t LocARNA::AlignerP::partFunc [protected]

the total partition function (only defined after call of align_inside())

pf_score_t LocARNA::AlignerP::pf_scale [protected]

scales the partition function. We compute partition functions divided by pf_scale in order to avoid overflow of the double floating point range.

AlignerPRestriction LocARNA::AlignerP::r [protected]

restriction of alignment

See also:: corrsespoding member r of Aligner

Note:: Currently, this is not used for an actual purpose in AlignerP, since we do not implement local alignment and/or k-best for partition functions.

The documentation for this class was generated from the following files:

src/LocARNA/aligner_p.hh
src/LocARNA/aligner_p.cc

Public Types

Public Member Functions

Static Public Member Functions

Protected Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation