Represents a multiple alignment. More...
#include <multiple_alignment.hh>
Classes | |
class | AliColumn |
read only proxy class representing a column of the alignment More... | |
struct | AnnoType |
type of sequence annotation. enumerates legal annotation types More... | |
struct | FormatType |
file format type for multiple alignments More... | |
class | SeqEntry |
A row in a multiple alignment. More... | |
Public Types | |
typedef size_t | size_type |
size type | |
typedef std::vector< SeqEntry > ::const_iterator | const_iterator |
const iterator of sequence entries | |
Public Member Functions | |
MultipleAlignment () | |
Construct empty. | |
MultipleAlignment (const std::string &file, FormatType::type format=FormatType::CLUSTAL) | |
Construct from file. | |
MultipleAlignment (std::istream &in, FormatType::type format=FormatType::CLUSTAL) | |
Construct from stream. | |
MultipleAlignment (const std::string &name, const std::string &sequence) | |
Construct as degenerate alignment of one sequence. | |
MultipleAlignment (const std::string &nameA, const std::string &nameB, const std::string &alistringA, const std::string &alistringB) | |
Construct as pairwise alignment from names and alignment strings. | |
MultipleAlignment (const Alignment &alignment, bool only_local=false, bool special_gap_symbols=false) | |
Construct from Alignment object. | |
MultipleAlignment (const AlignmentEdges &edges, const Sequence &seqA, const Sequence &seqB) | |
Construct from alignment edges and sequences. | |
virtual | ~MultipleAlignment () |
virtual destructor | |
const Sequence & | as_sequence () const |
"cast" multiple alignment to sequence | |
void | normalize_rna_symbols () |
normalize rna symbols | |
size_type | num_of_rows () const |
Number of rows of multiple aligment. | |
bool | empty () const |
Emptiness check. | |
const SequenceAnnotation & | annotation (const AnnoType::type &annotype) const |
Read access of annotation by prefix. | |
void | set_annotation (const AnnoType::type &annotype, const SequenceAnnotation &annotation) |
Write access to annotation. | |
bool | has_annotation (const AnnoType::type &annotype) const |
bool | is_proper () const |
Test whether alignment is proper. | |
pos_type | length () const |
Length of multiple aligment. | |
const_iterator | begin () const |
Begin for read-only traversal of name/sequence pairs. | |
const_iterator | end () const |
End for read-only traversal of name/sequence pairs. | |
bool | contains (const std::string &name) const |
Test whether name exists. | |
size_type | index (const std::string &name) const |
Access index by name. | |
const SeqEntry & | seqentry (size_type index) const |
Access name/sequence pair by index. | |
const SeqEntry & | seqentry (const std::string &name) const |
Access name/sequence pair by name. | |
size_type | deviation (const MultipleAlignment &ma) const |
Deviation of a multiple alignment from a reference alignment. | |
double | sps (const MultipleAlignment &ma, bool compalign=true) const |
Sum-of-pairs score between a multiple alignment and a reference alignment. | |
double | cmfinder_realignment_score (const MultipleAlignment &ma) const |
Cmfinder realignment score of a multiple alignment to a reference alignment. | |
double | avg_deviation_score (const MultipleAlignment &ma) const |
Average deviation score. | |
std::string | consensus_sequence () const |
Consensus sequence of multiple alignment. | |
AliColumn | column (size_type col_index) const |
Access alignment column. | |
void | append (const SeqEntry &seqentry) |
Append sequence entry. | |
void | prepend (const SeqEntry &seqentry) |
Prepend sequence entry. | |
void | operator+= (const AliColumn &c) |
Append a column. | |
void | operator+= (char c) |
Append the same character to each row. | |
void | reverse () |
reverse the multiple alignment | |
std::ostream & | write (std::ostream &out, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const |
Write alignment to stream. | |
std::ostream & | write (std::ostream &out, size_t width, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const |
Write alignment to stream (wrapped) | |
std::ostream & | write_name_sequence_line (std::ostream &out, const std::string &name, const std::string &sequence, size_t namewidth) const |
Write formatted line of name and sequence. | |
std::ostream & | write (std::ostream &out, size_type start, size_type end, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const |
Write sub-alignment to stream. | |
bool | checkAlphabet (const Alphabet< char > &alphabet) const |
check character constraints | |
void | write_debug (std::ostream &out=std::cout) const |
Print contents of object to stream. | |
Static Public Member Functions | |
static size_t | num_of_annotypes () |
number of annotation types | |
Protected Member Functions | |
void | init (const AlignmentEdges &edges, const Sequence &seqA, const Sequence &seqB, bool special_gap_symbols) |
Initialize from alignment edges and sequences. |
Represents a multiple alignment.
The multiple alignment is implemented as vector of name/sequence pairs.
Supports traversal of name/sequence pairs. The sequence entries support mapping from columns to positions and back.
Names are unique in a multiple alignment object.
Sequences positions and column indices are 1..len.
MultipleAlignment can have anchor and structure annotation and can read and write them.
LocARNA::MultipleAlignment::MultipleAlignment | ( | const std::string & | file, |
FormatType::type | format = FormatType::CLUSTAL |
||
) |
Construct from file.
file | name of input file |
format | file format ( |
failure | on read problems |
LocARNA::MultipleAlignment::MultipleAlignment | ( | std::istream & | in, |
FormatType::type | format = FormatType::CLUSTAL |
||
) |
Construct from stream.
in | input stream with alignment in clustalW-like format |
format | file format ( |
failure | on read errors |
LocARNA::MultipleAlignment::MultipleAlignment | ( | const std::string & | name, |
const std::string & | sequence | ||
) |
Construct as degenerate alignment of one sequence.
name | name of sequence |
sequence | sequence strings |
LocARNA::MultipleAlignment::MultipleAlignment | ( | const std::string & | nameA, |
const std::string & | nameB, | ||
const std::string & | alistringA, | ||
const std::string & | alistringB | ||
) |
Construct as pairwise alignment from names and alignment strings.
nameA | name of sequence A |
nameB | name of sequence B |
alistringA | alignment strings of sequence A |
alistringB | alignment strings of sequence B |
LocARNA::MultipleAlignment::MultipleAlignment | ( | const Alignment & | alignment, |
bool | only_local = false , |
||
bool | special_gap_symbols = false |
||
) |
Construct from Alignment object.
alignment | object of type Alignment |
only_local | if true, construct only local alignment |
special_gap_symbols | if true, use special distinct gap symbols for gaps due to loop deletion '_' or sparsification '~' |
LocARNA::MultipleAlignment::MultipleAlignment | ( | const AlignmentEdges & | edges, |
const Sequence & | seqA, | ||
const Sequence & | seqB | ||
) |
Construct from alignment edges and sequences.
edges | alignment edges |
seqA | sequence A |
seqB | sequence B |
const SequenceAnnotation & LocARNA::MultipleAlignment::annotation | ( | const AnnoType::type & | annotype | ) | const |
Read access of annotation by prefix.
type | of annotation |
void LocARNA::MultipleAlignment::append | ( | const SeqEntry & | seqentry | ) |
Append sequence entry.
seqentry | new sequence entry |
const Sequence & LocARNA::MultipleAlignment::as_sequence | ( | ) | const |
"cast" multiple alignment to sequence
double LocARNA::MultipleAlignment::avg_deviation_score | ( | const MultipleAlignment & | ma | ) | const |
Average deviation score.
ma | multiple alignment |
const_iterator LocARNA::MultipleAlignment::begin | ( | ) | const [inline] |
Begin for read-only traversal of name/sequence pairs.
bool LocARNA::MultipleAlignment::checkAlphabet | ( | const Alphabet< char > & | alphabet | ) | const |
check character constraints
Check whether the alignment contains characters from the given alphabet only and, if warn, print warnings otherwise.
alphabet | alphabet of admissible characters |
double LocARNA::MultipleAlignment::cmfinder_realignment_score | ( | const MultipleAlignment & | ma | ) | const |
Cmfinder realignment score of a multiple alignment to a reference alignment.
ma | multiple alignment |
AliColumn LocARNA::MultipleAlignment::column | ( | size_type | col_index | ) | const [inline] |
Access alignment column.
col_index | column index |
std::string LocARNA::MultipleAlignment::consensus_sequence | ( | ) | const |
Consensus sequence of multiple alignment.
Consensus sequence by simple majority in each column. Assume that only ascii < 127 characters occur
bool LocARNA::MultipleAlignment::contains | ( | const std::string & | name | ) | const |
Test whether name exists.
name | name of a sequence |
size_type LocARNA::MultipleAlignment::deviation | ( | const MultipleAlignment & | ma | ) | const |
Deviation of a multiple alignment from a reference alignment.
ma | multiple alignment |
bool LocARNA::MultipleAlignment::empty | ( | ) | const [inline] |
Emptiness check.
const_iterator LocARNA::MultipleAlignment::end | ( | ) | const [inline] |
End for read-only traversal of name/sequence pairs.
bool LocARNA::MultipleAlignment::has_annotation | ( | const AnnoType::type & | annotype | ) | const [inline] |
Annotation availability
prefix | annotation prefix |
size_type LocARNA::MultipleAlignment::index | ( | const std::string & | name | ) | const [inline] |
Access index by name.
name | name of a sequence |
void LocARNA::MultipleAlignment::init | ( | const AlignmentEdges & | edges, |
const Sequence & | seqA, | ||
const Sequence & | seqB, | ||
bool | special_gap_symbols | ||
) | [protected] |
Initialize from alignment edges and sequences.
edges | alignment edges |
seqA | sequence A |
seqB | sequence B |
special_gap_symbols | if true, use special distinct gap symbols for gaps due to loop deletion '_' or sparsification '~' |
bool LocARNA::MultipleAlignment::is_proper | ( | ) | const |
Test whether alignment is proper.
pos_type LocARNA::MultipleAlignment::length | ( | ) | const [inline] |
Length of multiple aligment.
normalize rna symbols
Normalize the symbols in all aligned sequences assuming that they code for RNA
static size_t LocARNA::MultipleAlignment::num_of_annotypes | ( | ) | [inline, static] |
number of annotation types
size_type LocARNA::MultipleAlignment::num_of_rows | ( | ) | const [inline] |
Number of rows of multiple aligment.
void LocARNA::MultipleAlignment::operator+= | ( | const AliColumn & | c | ) |
Append a column.
c | column that is appended |
void LocARNA::MultipleAlignment::operator+= | ( | char | c | ) |
Append the same character to each row.
c | character that is appended |
void LocARNA::MultipleAlignment::prepend | ( | const SeqEntry & | seqentry | ) |
Prepend sequence entry.
seqentry | new sequence entry |
const SeqEntry& LocARNA::MultipleAlignment::seqentry | ( | size_type | index | ) | const [inline] |
Access name/sequence pair by index.
index | index of name/sequence pair (0-based) |
const SeqEntry& LocARNA::MultipleAlignment::seqentry | ( | const std::string & | name | ) | const [inline] |
Access name/sequence pair by name.
name | name of name/sequence pair |
void LocARNA::MultipleAlignment::set_annotation | ( | const AnnoType::type & | annotype, |
const SequenceAnnotation & | annotation | ||
) | [inline] |
Write access to annotation.
prefix | annotation prefix |
annotation | sequence annotation |
double LocARNA::MultipleAlignment::sps | ( | const MultipleAlignment & | ma, |
bool | compalign = true |
||
) | const |
Sum-of-pairs score between a multiple alignment and a reference alignment.
ma | multiple alignment |
compalign | whether to compute score like compalign |
std::ostream & LocARNA::MultipleAlignment::write | ( | std::ostream & | out, |
FormatType::type | format = MultipleAlignment::FormatType::CLUSTAL |
||
) | const |
Write alignment to stream.
out | output stream |
format | alignment format; only CLUSTAL or STOCKHOLM; default: CLUSTAL ( |
Writes one line "<name> <seq>" for each sequence; moereover, writes annotations.
std::ostream & LocARNA::MultipleAlignment::write | ( | std::ostream & | out, |
size_t | width, | ||
FormatType::type | format = MultipleAlignment::FormatType::CLUSTAL |
||
) | const |
Write alignment to stream (wrapped)
out | output stream |
width | output stream |
format | alignment format; only CLUSTAL or STOCKHOLM; default: CLUSTAL ( |
Writes lines "<name> <seq>" per sequence, wraps lines at width
std::ostream & LocARNA::MultipleAlignment::write | ( | std::ostream & | out, |
size_type | start, | ||
size_type | end, | ||
FormatType::type | format = MultipleAlignment::FormatType::CLUSTAL |
||
) | const |
Write sub-alignment to stream.
Write from position start to position end to output stream out; write lines "<name> <seq>"
out | output stream |
start | start column (1-based) |
end | end column (1-based) |
format | alignment format; default: CLUSTAL ( |
void LocARNA::MultipleAlignment::write_debug | ( | std::ostream & | out = std::cout | ) | const |
Print contents of object to stream.
out | output stream |
std::ostream & LocARNA::MultipleAlignment::write_name_sequence_line | ( | std::ostream & | out, |
const std::string & | name, | ||
const std::string & | sequence, | ||
size_t | namewidth | ||
) | const |
Write formatted line of name and sequence.
The line is formatted such that it fits the output of the write methods.
out | output stream |
name | name string |
sequence | sequence string |