Represents anchor constraints between two sequences. More...
#include <anchor_constraints.hh>
Public Types | |
typedef size_t | size_type |
size type | |
typedef std::pair< size_type, size_type > | size_pair_t |
size pair | |
typedef size_pair_t | range_t |
type of range | |
Public Member Functions | |
AnchorConstraints (size_type lenA, const std::vector< std::string > &seqCA, size_type lenB, const std::vector< std::string > &seqCB, bool strict) | |
Construct from sequence lengths and anchor names. | |
AnchorConstraints (size_type lenA, const std::string &seqCA, size_type lenB, const std::string &seqCB, bool strict) | |
Construct from sequence lengths and anchor names. | |
bool | allowed_match (size_type i, size_type j) const |
is match allowed | |
bool | allowed_del_unopt (size_type i, size_type j) const |
is deletion allowed? (unoptimized) | |
bool | allowed_del (size_type i, size_type j) const |
is deletion allowed? (unoptimized version) | |
bool | allowed_ins_unopt (size_type i, size_type j) const |
is insertion allowed? (unoptimized) | |
bool | allowed_ins (size_type i, size_type j) const |
is insertion allowed? (unoptimized) | |
std::string | get_name_a (size_type i) const |
get the name of position i in A | |
std::string | get_name_b (size_type j) const |
get the name of position j in B | |
size_type | name_size () const |
returns length/size of the names | |
bool | empty () const |
is the constraint declaration empty | |
size_pair_t | rightmost_anchor () const |
Get rightmost anchor. | |
size_pair_t | leftmost_anchor () const |
Get leftmost anchor. | |
bool | is_anchored_a (size_type i) const |
Is position in A anchored? | |
bool | is_anchored_b (size_type i) const |
Is position in B anchored? | |
bool | is_named_a (size_type i) const |
Is position in A named? | |
bool | is_named_b (size_type i) const |
Is position in B named? | |
void | print_debug () |
write some debug information to stderr |
Represents anchor constraints between two sequences.
Maintains the constraints on (non-structural) alignment edges that have to be satisfied during the alignment
alignment algorithms can
and ask informations about sequence names.
SEMANTIC OF ANCHOR CONSTRAINTS
Generally, anchor constraints (i,j) enforce that positions i in A and j in B are matched; neither i nor j are deleted (for local alignment, this implies that both positions occur in the local alignment) The class allows to choose between two semantics of anchor constraints. The relaxed semantics can drop constraints and produce inconsisitencies during multiple alignment, when some names occur only in a subset of the sequences. Therefore, the strict semantics is introduced, which avoids such problems by introducing additional (order) dependencies between different names (consequently, the constraint specification is somewhat less flexible).
Relaxed semantics (originally, the only implemented semantics):
a) Positions with equal names must be matched (aligned to each other) Consequently, positions with names that occur also in the other sequence cannot be deleted. b) Names that occur in only one sequence, do not impose any constraints. Therefore, names can occur in arbitrary order.
Strict (ordered) semantics:
a) Names must be strictly lexicographically ordered in the annotation of each sequence b) Positions of equal names must be matched. c) Alignment columns must not violate the lex order, in the following sense: each alignment column, where at least one position is named, receives this name; the names of alignment columns must be lex-ordered (in the order of the columns).
LocARNA::AnchorConstraints::AnchorConstraints | ( | size_type | lenA, |
const std::vector< std::string > & | seqCA, | ||
size_type | lenB, | ||
const std::vector< std::string > & | seqCB, | ||
bool | strict | ||
) |
Construct from sequence lengths and anchor names.
lenA | length of sequence A |
seqCA | vector of anchor strings for sequence A |
lenB | length of sequence B |
seqCB | vector of anchor strings for sequence B |
strict | use strict semantics |
The constraints (=alignment edges that have to be satisfied) are encoded as follows: equal symbols in the sequences for A and B form an edge
In order to specify an arbitrary number of sequences, the strings can consist of several lines, then a symbol consists of all characters of the column. '.' and ' ' are neutral character, in the sense that columns consisting only of neutral characters do not specify names that have to match. However, neutral characters are not identified in names that contain at least one non-neutral character!
Example: seqCA={"..123...."} seqCB={"...12.3...."}
specifies the edges (3,4), (4,5), and (5,7)
Example 2: seqCA={"..AAB....", "..121...."} seqCB={"...AA.B....", "...12.1...."} specifies the same constraints, allowing a larger name space for constraints.
LocARNA::AnchorConstraints::AnchorConstraints | ( | size_type | lenA, |
const std::string & | seqCA, | ||
size_type | lenB, | ||
const std::string & | seqCB, | ||
bool | strict | ||
) |
Construct from sequence lengths and anchor names.
lenA | length of sequence A |
seqCA | concatenated anchor strings for sequence A (separated by '#') |
lenB | length of sequence B |
seqCB | concatenated anchor strings for sequence B (separated by '#') |
strict | use strict semantics |
for semantics of anchor strings see first constructor
bool LocARNA::AnchorConstraints::allowed_del | ( | size_type | i, |
size_type | j | ||
) | const [inline] |
is deletion allowed? (unoptimized version)
i | position/matrix index of first sequence |
j | position/matrix index of second sequence |
bool LocARNA::AnchorConstraints::allowed_del_unopt | ( | size_type | i, |
size_type | j | ||
) | const [inline] |
is deletion allowed? (unoptimized)
i | position/matrix index of first sequence |
j | position/matrix index of second sequence |
Definition (strict semantics): allowed_del(i, j) iff (! is_anchored(i) && names_a_[ max { i'<=i | named(i') ] < names_b_[ min { j'>=j+1 | named(j') ] && names_a_[ min { i'>=i | named(i') ] > names_b_[ max { j'<=j | named(j') ])
Definition (relaxed semantics): allowed_del(i,j) iff i~"j+0.5" does not cross (or touch) any edge i'~j', where name_a_[i']=name_b_[j']
bool LocARNA::AnchorConstraints::allowed_ins | ( | size_type | i, |
size_type | j | ||
) | const [inline] |
is insertion allowed? (unoptimized)
i | position/matrix index of first sequence |
j | position/matrix index of second sequence |
bool LocARNA::AnchorConstraints::allowed_ins_unopt | ( | size_type | i, |
size_type | j | ||
) | const [inline] |
is insertion allowed? (unoptimized)
i | position/matrix index of first sequence |
j | position/matrix index of second sequence |
bool LocARNA::AnchorConstraints::allowed_match | ( | size_type | i, |
size_type | j | ||
) | const [inline] |
is match allowed
i | position/matrix index of first sequence |
j | position/matrix index of second sequence |
Test whether the alignment edge i~j (i.e. the match of i and j) is allowed? An alignment edge is allowed, iff it is not in conflict with any anchor constraint.
Definition (strict semantics): allowed_match(i,j) iff (names_a_[ max { i'<=i | named(i') ] <= names_b_[ min { j'>=j | named(j') ] && names_a_[ min { i'>=i | named(i') ] >= names_b_[ max { j'<=j | named(j') ])
Definition (relaxed semantics): allowed_match(i,j) iff i~j does not cross (or touch) any edge i'~j' != i~j, where name_a_[i']=name_b_[j']
bool LocARNA::AnchorConstraints::is_anchored_a | ( | size_type | i | ) | const [inline] |
Is position in A anchored?
i | position in A |
bool LocARNA::AnchorConstraints::is_anchored_b | ( | size_type | i | ) | const [inline] |
Is position in B anchored?
bool LocARNA::AnchorConstraints::is_named_a | ( | size_type | i | ) | const [inline] |
Is position in A named?
i | position in A |
bool LocARNA::AnchorConstraints::is_named_b | ( | size_type | i | ) | const [inline] |
Is position in B named?
size_pair_t LocARNA::AnchorConstraints::leftmost_anchor | ( | ) | const [inline] |
Get leftmost anchor.
size_pair_t LocARNA::AnchorConstraints::rightmost_anchor | ( | ) | const [inline] |
Get rightmost anchor.