Rileen Sinha, Thorsten Lenser, Niels Jahn, Ulrike Gausmann, Swetlana Friedel, Karol Szafranski, Klaus Huse, Philip Rosenstiel, Jochen Hampe, Stefan Schuster, Michael Hiller, Rolf Backofen, Matthias Platzer
In: BMC Bioinformatics, 2010, 11, 216
BACKGROUND: Subtle alternative splicing events involving tandem splice sites separated by a short (2-12 nucleotides) distance are frequent and evolutionarily widespread in eukaryotes, and a major contributor to the complexity of transcriptomes and proteomes. However, these events have been either omitted altogether in databases on alternative splicing, or only the cases of experimentally confirmed alternative splicing have been reported. Thus, a database which covers all confirmed cases of subtle alternative splicing as well as the numerous putative tandem splice sites (which might be confirmed once more transcript data becomes available), and allows to search for tandem splice sites with specific features and download the results, is a valuable resource for targeted experimental studies and large-scale bioinformatics analyses of tandem splice sites. Towards this goal we recently set up TassDB (Tandem Splice Site DataBase, version 1), which stores data about alternative splicing events at tandem splice sites separated by 3 nt in eight species. DESCRIPTION: We have substantially revised and extended TassDB. The currently available version 2 contains extensive information about tandem splice sites separated by 2-12 nt for the human and mouse transcriptomes including data on the conservation of the tandem motifs in five vertebrates. TassDB2 offers a user-friendly interface to search for specific genes or for genes containing tandem splice sites with specific features as well as the possibility to download result datasets. For example, users can search for cases of alternative splicing where the proportion of EST/mRNA evidence supporting the minor isoform exceeds a specific threshold, or where the difference in splice site scores is specified by the user. The predicted impact of each event on the protein is also reported, along with information about being a putative target for the nonsense-mediated decay (NMD) pathway. Links are provided to the UCSC genome browser and other external resources. CONCLUSION: TassDB2, available via http://www.tassdb.info, provides comprehensive resources for researchers interested in both targeted experimental studies and large-scale bioinformatics analyses of short distance tandem splice sites.
Michael Hiller, Matthias Platzer
In: Trends in Genetics, 2008, 24(5), 246-55
Alternative splicing at donor or acceptor sites located just a few nucleotides apart is widespread in many species. It results in subtle changes in the transcripts and often in the encoded proteins. Several of these tandem splice events contribute to the repertoire of functionally different proteins, whereas many are neutral or deleterious. Remarkably, some of the functional events are differentially spliced in tissues or developmental stages, whereas others exhibit constant splicing ratios, indicating that function is not always associated with differential splicing. Stochastic splice site selection seems to play a major role in these processes. Here, we review recent progress in understanding functional and evolutionary aspects as well as the mechanism of splicing at short-distance tandem sites.
K. Huse, S. Taudien, M. Groth, P. Rosenstiel, K. Szafranski, M. Hiller, J. Hampe, K. Junker, J. Schubert, S. Schreiber, G. Birkenmeier, M. Krawczak, M. Platzer
In: Tumour Biol, 2008, 29(2), 83-92
Background/Aims: Prostate cancer represents the cancer with the highest worldwide prevalence in men. Chromosome 8p23 has shown suggestive genetic linkage to early-onset familial prostate cancer and is frequently deleted in cancer cells of the urogenital tract. Within this locus some beta-defensin genes (among them DEFB4, DEFB103, DEFB104) are localized, which are arranged in a gene cluster shown to exhibit an extensive copy number variation in the population. This structural variation considerably hampers genetic studies. In a new approach considering both sequence as well as copy number variations we aimed to compare the defensin locus at 8p23 in prostate cancer patients and controls. Methods: We apply PCR/cloning-based haplotyping and high-throughput copy number determination methods which allow assessment of both individual haplotypes and gene copy numbers not accessible to conventional SNP-based genotyping. Results: We demonstrate association of four common DEFB104 haplotypes with the risk of prostate cancer in two independent patient cohorts. Moreover, we show that high copy numbers (>9) of the defensin gene cluster are significantly underrepresented in both patient samples. Conclusions: Our findings imply a role of the antibacterial defensins in prostate cancerogenesis qualifying distinct gene variants and copy numbers as potential tumor markers.
Stefanie Schindler, Karol Szafranski, Michael Hiller, Gul Shad Ali, Saiprasad G. Palusa, Rolf Backofen, Matthias Platzer, Anireddy S. N. Reddy
In: BMC Genomics, 2008, 9, 159
BACKGROUND: Several recent studies indicate that alternative splicing in Arabidopsis and other plants is a common mechanism for post-transcriptional modulation of gene expression. However, few analyses have been done so far to elucidate the functional relevance of alternative splicing in higher plants. Representing a frequent and universal subtle alternative splicing event among eukaryotes, alternative splicing at NAGNAG acceptors contributes to transcriptome diversity and therefore, proteome plasticity. Alternatively spliced NAGNAG acceptors are overrepresented in genes coding for proteins with RNA-recognition motifs (RRMs). As SR proteins, a family of RRM-containing important splicing factors, are known to be extensively alternatively spliced in Arabidopsis, we analyzed alternative splicing at NAGNAG acceptors in SR and SR-related genes. RESULTS: In a comprehensive analysis of the Arabidopsis thaliana genome, we identified 6,772 introns that exhibit a NAGNAG acceptor motif. Alternative splicing at these acceptors was assessed using available EST data, complemented by a sequence-based prediction method. Of the 36 identified introns within 30 SR and SR-related protein-coding genes that have a NAGNAG acceptor, we selected 15 candidates for an experimental analysis of alternative splicing under several conditions. We provide experimental evidence for 8 of these candidates being alternatively spliced. Quantifying the ratio of NAGNAG-derived splice variants under several conditions, we found organ-specific splicing ratios in adult plants and changes in seedlings of different ages. Splicing ratio changes were observed in response to heat shock and most strikingly, cold shock. Interestingly, the patterns of differential splicing ratios are similar for all analyzed genes. CONCLUSION: NAGNAG acceptors frequently occur in the Arabidopsis genome and are particularly prevalent in SR and SR-related protein-coding genes. A lack of extensive EST coverage can be compensated by using the proposed sequence-based method to predict alternative splicing at these acceptors. Our findings indicate that the differential effects on NAGNAG alternative splicing in SR and SR-related genes are organ- and condition-specific rather than gene-specific.
Michael Hiller, Karol Szafranski, Klaus Huse, Rolf Backofen, Matthias Platzer
In: BMC Evolutionary Biology, 2008, 8, 89
BACKGROUND: Alternative selection of splice sites in tandem donors and acceptors is a major mode of alternative splicing. Here, we analyzed whether in-frame tandem sites leading to subtle mRNA insertions/deletions of 3, 6, or 9 nucleotides are under natural selection. RESULTS: We found multiple lines of evidence that the human protein coding sequences are under selection against such in-frame tandem splice events, indicating that these events are often deleterious. The strength of selection is not homogeneous within the coding sequence as protein regions that fold into a fixed 3D structure (intrinsically ordered) are under stronger selection, especially against sites with a strong minor splice site. Investigating structures of functional protein domains, we found that tandem acceptors are preferentially located at the domain surface and outside structural elements such as helices and sheets. Using three-species comparisons, we estimate that more than half of all mutations that create NAGNAG acceptors in the coding region have been eliminated by selection. CONCLUSION: We estimate that 2,400 introns are under selection against possessing a tandem site.
Michael Hiller, Karol Szafranski, Rileen Sinha, Klaus Huse, Swetlana Nikolajewa, Philip Rosenstiel, Stefan Schreiber, Rolf Backofen, Matthias Platzer
In: RNA, 2008, 14, 616-29
Many alternative splice events result in subtle mRNA changes, and most of them occur at short-distance tandem donor and acceptor sites. The splicing mechanism of such tandem sites likely involves the stochastic selection of either splice site. While tandem splice events are frequent, it is unknown how many are functionally important. Here, we use phylogenetic conservation to address this question, focusing on tandems with a distance of 3-9 nucleotides. We show that previous contradicting results on whether alternative or constitutive tandem motifs are more conserved between species can be explained by a statistical paradox (Simpson's paradox). Applying methods that take biases into account, we found higher conservation of alternative tandems in mouse, dog, and even chicken, zebrafish, and Fugu genomes. We estimated a lower bound for the number of alternative sites that are under purifying (negative) selection. While the absolute number of conserved tandem motifs decreases with the evolutionary distance, the fraction under selection increases. Interestingly, a number of frameshifting tandems are under selection, suggesting a role in regulating mRNA and protein levels via nonsense-mediated decay (NMD). An analysis of the intronic flanks shows that purifying selection also acts on the intronic sequence. We propose that stochastic splice site selection can be an advantageous mechanism that allows constant splice variant ratios in situations where a deviation in this ratio is deleterious.
Michael Hiller, Zhaiyi Zhang, Rolf Backofen, Stefan Stamm
In: PLoS Genet, 2007, 3(11), e204
The secondary structure of a pre-mRNA influences a number of processing steps including alternative splicing. Since most splicing regulatory proteins bind to single-stranded RNA, the sequestration of RNA into double strands could prevent their binding. Here, we analyzed the secondary structure context of experimentally determined splicing enhancer and silencer motifs in their natural pre-mRNA context. We found that these splicing motifs are significantly more single-stranded than controls. These findings were validated by transfection experiments, where the effect of enhancer or silencer motifs on exon skipping was much more pronounced in single-stranded conformation. We also found that the structural context of predicted splicing motifs is under selection, suggesting a general importance of secondary structures on splicing and adding another level of evolutionary constraints on pre-mRNAs. Our results explain the action of mutations that affect splicing and indicate that the structural context of splicing motifs is part of the mRNA splicing code.
Michael Hiller, Swetlana Nikolajewa, Klaus Huse, Karol Szafranski, Philip Rosenstiel, Stefan Schuster, Rolf Backofen, Matthias Platzer
In: Nucleic Acids Research, 2007, 35(Database issue), D188-92
Subtle alternative splice events at tandem splice sites are frequent in eukaryotes and substantially increase the complexity of transcriptomes and proteomes. We have developed a relational database, TassDB (TAndem Splice Site DataBase), which stores extensive data about alternative splice events at GYNGYN donors and NAGNAG acceptors. These splice events are of subtle nature since they mostly result in the insertion/deletion of a single amino acid or the substitution of one amino acid by two others. Currently, TassDB contains 114 554 tandem splice sites of eight species, 5209 of which have EST/mRNA evidence for alternative splicing. In addition, human SNPs that affect NAGNAG acceptors are annotated. The database provides a user-friendly interface to search for specific genes or for genes containing tandem splice sites with specific features as well as the possibility to download large datasets. This database should facilitate further experimental studies and large-scale bioinformatics analyses of tandem splice sites. The database is available at http://helios.informatik.uni-freiburg.de/TassDB/.
K. Szafranski, S. Schindler, S. Taudien, M. Hiller, K. Huse, N. Jahn, S. Schreiber, R. Backofen, M. Platzer
In: Genome Biol, 2007, 8(8), R154
ABSTRACT: BACKGROUND: Despite some degeneracy of sequence signals that govern splicing of eukaryotic pre-mRNAs, it is an accepted rule that U2-dependent introns exhibit the 3' terminal dinucleotide AG. Intrigued by anecdotal evidence for functional non-AG 3' splice sites, we carried out a human genome-wide screen. RESULTS: We identified TG dinucleotides functioning as alternative 3' splice sites in 36 human genes. The TG-derived splice variants were experimentally validated with a success rate of 92%. Interestingly, ratios of alternative splice variants are tissue-specific for several introns. TG splice sites and their flanking intron sequences are substantially conserved between orthologous vertebrate genes, even between human and frog, indicating functional relevance. Remarkably, TG splice sites are exclusively found as alternative 3' splice sites, never as the sole 3' splice site for an intron, and we observed a distance constraint for TG-AG splice site tandems. CONCLUSION: Since TGs splice sites are exclusively found as alternative 3' splice sites, the U2 spliceosome apparently accomplishes perfect specificity for 3' AGs at an early splicing step, but may choose 3' TGs during later steps. Given the tiny fraction of TG 3' splice sites compared to the vast amount of non-viable TGs, cis-acting sequence signals must significantly contribute to splice site definition. Thus, we consider TG-AG 3' splice site tandems as promising subjects for studies on the mechanisms of 3' splice site selection.
Swetlana Nikolajewa, Rainer Pudimat, Michael Hiller, Matthias Platzer, Rolf Backofen
In: Nucleic Acids Research, 2007, 35(Web Server issue), W688-93
BioBayesNet is a new web application that allows the easy modeling and classification of biological data using Bayesian networks. To learn Bayesian networks the user can either upload a set of annotated FASTA sequences or a set of pre-computed feature vectors. In case of FASTA sequences, the server is able to generate a wide range of sequence and structural features from the sequences. These features are used to learn Bayesian networks. An automatic feature selection procedure assists in selecting discriminative features, providing an (locally) optimal set of features. The output includes several quality measures of the overall network and individual features as well as a graphical representation of the network structure, which allows to explore dependencies between features. Finally, the learned Bayesian network or another uploaded network can be used to classify new data. BioBayesNet facilitates the use of Bayesian networks in biological sequences analysis and is flexible to support modeling and classification applications in various scientific fields. The BioBayesNet server is available at http://biwww3.informatik.uni-freiburg.de:8080/BioBayesNet/.
M. Hiller, K. Huse, K. Szafranskzi, P. Rosenstiel, S. Schreiber, R. Backofen, M. Platzer
In: Genome Biol, 2006, 7(7), R65
ABSTRACT: BACKGROUND: Splice donor sites have a highly conserved GT or GC dinucleotide and an extended intronic consensus sequence GTRAGT that reflects the sequence complementarity to the U1 snRNA. Here, we focus on unusual donor sites with the motif GYNGYN (Y stands for C or T; N stands for A,C,G, or T). RESULTS: While only one GY functions as a splice donor for the majority of these splice sites in human, we provide computational and experimental evidence that 110 (1.3%) allow alternative splicing at both GY donors. The resulting splice forms differ in only three nucleotides which results mostly in the insertion/deletion of one amino acid. However, we also report the insertion of a stop codon in four cases. Investigating what distinguishes alternatively from not alternatively spliced GYNGYN donors, we found differences in the binding to U1 snRNA, a strong correlation between U1 snRNA binding strength and the preferred donor, overrepresented sequence motifs in the adjacent introns, and a higher conservation of the exonic and intronic flanks between human and mouse. Extending our genome-wide analysis to seven other eukaryotic species, we found alternatively spliced GYNGYN donors in all species from mouse to C. elegans and even in A. thaliana. Experimental verification of a conserved GTAGTT donor of the STAT3 gene in human and mouse reveals a remarkably similar ratio of alternatively spliced transcripts in both species. CONCLUSION: In contrast to alternative splicing in general, GYNGYN donors in addition to NAGNAG acceptors enable subtle protein variations.
Matthias Platzer, Michael Hiller, Karol Szafranski, Niels Jahn, Jochen Hampe, Stefan Schreiber, Rolf Backofen, Klaus Huse
In: Nat Biotechnol, 2006, 24(9), 1068-70
No Abstract available
Michael Hiller, Klaus Huse, Karol Szafranski, Niels Jahn, Jochen Hampe, Stefan Schreiber, Rolf Backofen, Matthias Platzer
In: Am J Hum Genet, 2006, 78(2), 291-302
Aberrant or modified splicing patterns of genes are causative for many human diseases. Therefore, the identification of genetic variations that cause changes in the splicing pattern of a gene is important. Elsewhere, we described the widespread occurrence of alternative splicing at NAGNAG acceptors. Here, we report a genomewide screen for single-nucleotide polymorphisms (SNPs) that affect such tandem acceptors. From 121 SNPs identified, we extracted 64 SNPs that most likely affect alternative NAGNAG splicing. We demonstrate that the NAGNAG motif is necessary and sufficient for this type of alternative splicing. The evolutionarily young NAGNAG alleles, as determined by the comparison with the chimpanzee genome, exhibit the same biases toward intron phase 1 and single-amino acid insertion/deletions that were already observed for all human NAGNAG acceptors. Since 28% of the NAGNAG SNPs occur in known disease genes, they represent preferable candidates for a more-detailed functional analysis, especially since the splice relevance for some of the coding SNPs is overlooked. Against the background of a general lack of methods for identifying splice-relevant SNPs, the presented approach is highly effective in the prediction of polymorphisms that are causal for variations in alternative splicing.
Michael Hiller, Rainer Pudimat, Anke Busch, Rolf Backofen
In: Nucleic Acids Research, 2006, 34(17), e117
RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.
Michael Hiller, Karol Szafranski, Rolf Backofen, Matthias Platzer
In: PLoS Genet, 2006, 2(11), e207
-
Michael Hiller, Klaus Huse, Matthias Platzer, Rolf Backofen
In: Genome Biol, 2005, 6(7), R58
BACKGROUND: Alternative splicing often occurs in the coding sequence and alters protein structure and function. It is mainly carried out in two ways: by skipping exons that encode a certain protein feature and by introducing a frameshift that changes the downstream protein sequence. These mechanisms are widespread and well investigated. RESULTS: Here, we propose an additional mechanism of alternative splicing to modulate protein function. This mechanism creates a protein feature by putting together two non-consecutive exons or destroys a feature by inserting an exon in its body. In contrast to other mechanisms, the individual parts of the feature are present in both splice variants but the feature is only functional in the splice form where both parts are merged. We provide evidence for this mechanism by performing a genome-wide search with four protein features: transmembrane helices, phosphorylation and glycosylation sites, and Pfam domains. CONCLUSION: We describe a novel type of event that creates or removes a protein feature by alternative splicing. Current data suggest that these events are rare. Besides the four features investigated here, this mechanism is conceivable for many other protein features, especially for small linear protein motifs. It is important for the characterization of functional differences of two splice forms and should be considered in genome-wide annotation efforts. Furthermore, it offers a novel strategy for ab initio prediction of alternative splice events.
Michael Hiller, Klaus Huse, Matthias Platzer, Rolf Backofen
In: Nucleic Acids Research, 2005, 33(17), 5611-21
Most of the known alternative splice events have been detected by the comparison of expressed sequence tags (ESTs) and cDNAs. However, not all splice events are represented in EST databases since ESTs have several biases. Therefore, non-EST based approaches are needed to extend our view of a transcriptome. Here, we describe a novel method for the ab initio prediction of alternative splice events that is solely based on the annotation of Pfam domains. Furthermore, we applied this approach in a genome-wide manner to all human RefSeq transcripts and predicted a total of 321 exon skipping and intron retention events. We show that this method is very reliable as 78% (250 of 321) of our predictions are confirmed by ESTs or cDNAs. Subsequent analyses of splice events within Pfam domains revealed a significant preference of alternative exon junctions to be located at the protein surface and to avoid secondary structure elements. Thus, splice events within Pfams are probable to alter the structure and function of a domain which makes them highly interesting for detailed biological investigation. As Pfam domains are annotated in many other species, our strategy to predict exon skipping and intron retention events might be important for species with a lower number of ESTs.
Michael Hiller, Rolf Backofen, Stephan Heymann, Anke Busch, Timo Mika Glaesser, Johann-Christoph Freytag
In: In Silico Biol, 2004, 4(2), 0017
Alternative splicing can yield manifold different mature mRNAs from one precursor. New findings indicate that alternative splicing occurs much more often than previously assumed. A major goal of functional genomics lies in elucidating and characterizing the entire spectrum of alternative splice forms. Existing approaches such as EST-alignments focus only on the mRNA sequence to detect alternative splice forms. They do not consider function and characteristics of the resulting proteins. One important example of such functional characterization is homology to a known protein domain family. A powerful description of protein domains are profile Hidden Markov models (HMM) as stored in the Pfam database. In this paper we address the problem of identifying the splice form with the highest similarity to a protein domain family. Therefore, we take into consideration all possible splice forms. As demonstrated here for a number of genes, this homology based approach can be used successfully for predicting partial gene structures. Furthermore, we present some novel splice form predictions with high-scoring protein domain homology and point out that the detection of splice form specific protein domains helps to answer questions concerning hereditary diseases. Simple approaches based on a BLASTP search cannot be applied here, since the number of possible splice forms increases exponentially with the number of exons. To this end, we have developed an efficient polynomial-time algorithm, called ASFPred (Alternative Splice Form Prediction). This algorithm needs only a set of exons as input.
Michael Hiller, Klaus Huse, Karol Szafranski, Niels Jahn, Jochen Hampe, Stefan Schreiber, Rolf Backofen, Matthias Platzer
In: Nat Genet, 2004, 36(12), 1255-7
Splice acceptors with the genomic NAGNAG motif may cause NAG insertion-deletions in transcripts, occur in 30% of human genes and are functional in at least 5% of human genes. We found five significant biases indicating that their distribution is nonrandom and that they are evolutionarily conserved and tissue-specific. Because of their subtle effects on mRNA and protein structures, these splice acceptors are often overlooked or underestimated, but they may have a great impact on biology and disease.