Lydia Hadjeras, Jurgen Bartel, Lisa-Katharina Maier, Sandra Maass, Verena Vogel, Sarah L. Svensson, Florian Eggenhofer, Rick Gelhausen, Teresa Muller, Omer S. Alkhnbashi, Rolf Backofen, Dorte Becher, Cynthia M. Sharma, Anita Marchfelder
In: Microlife, 2023, 4, uqad001
In contrast to extensively studied prokaryotic 'small' transcriptomes (encompassing all small noncoding RNAs), small proteomes (here defined as including proteins =70 aa) are only now entering the limelight. The absence of a complete small protein catalogue in most prokaryotes precludes our understanding of how these molecules affect physiology. So far, archaeal genomes have not yet been analyzed broadly with a dedicated focus on small proteins. Here, we present a combinatorial approach, integrating experimental data from small protein-optimized mass spectrometry (MS) and ribosome profiling (Ribo-seq), to generate a high confidence inventory of small proteins in the model archaeon Haloferax volcanii. We demonstrate by MS and Ribo-seq that 67% of the 317 annotated small open reading frames (sORFs) are translated under standard growth conditions. Furthermore, annotation-independent analysis of Ribo-seq data showed ribosomal engagement for 47 novel sORFs in intergenic regions. A total of seven of these were also detected by proteomics, in addition to an eighth novel small protein solely identified by MS. We also provide independent experimental evidence in vivo for the translation of 12 sORFs (annotated and novel) using epitope tagging and western blotting, underlining the validity of our identification scheme. Several novel sORFs are conserved in Haloferax species and might have important functions. Based on our findings, we conclude that the small proteome of H. volcanii is larger than previously appreciated, and that combining MS with Ribo-seq is a powerful approach for the discovery of novel small protein coding genes in archaea.
Rick Gelhausen, Teresa Müller, Sarah L Svensson, Omer S Alkhnbashi, Cynthia M Sharma, Florian Eggenhofer, Rolf Backofen
In: Briefings in Bioinformatics, 01 2022
Small proteins encoded by short open reading frames (ORFs) with 50 codons or fewer are emerging as an important class of cellular macromolecules in diverse organisms. However, they often evade detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation in genomic regions previously thought to be non-coding, driving the development of ORF detection tools using Ribo-seq data. However, only a handful of tools have been designed for bacteria, and these have not yet been systematically compared. Here, we aimed to identify tools that use Ribo-seq data to correctly determine the translational status of annotated bacterial ORFs and also discover novel translated regions with high sensitivity. To this end, we generated a large set of annotated ORFs from four diverse bacterial organisms, manually labeled for their translation status based on Ribo-seq data, which are available for future benchmarking studies. This set was used to investigate the predictive performance of seven Ribo-seq-based ORF detection tools (REPARATION\_blast, DeepRibo, Ribo-TISH, PRICE, smORFer, ribotricer and SPECtre), as well as IRSOM, which uses coding potential and RNA-seq coverage only. DeepRibo and REPARATION\_blast robustly predicted translated ORFs, including sORFs, with no significant difference for ORFs in close proximity to other genes versus stand-alone genes. However, no tool predicted a set of novel, experimentally verified sORFs with high sensitivity. Start codon predictions with smORFer show the value of initiation site profiling data to further improve the sensitivity of ORF prediction tools in bacteria. Overall, we find that bacterial tools perform well for sORF detection, although there is potential for improving their performance, applicability, usability and reproducibility.
Omer S. Alkhnbashi, Alexander Mitrofanov, Robson Bonidia, Martin Raden, Van Dinh Tran, Florian Eggenhofer, Shiraz A. Shah, Ekrem \"Ozt\"urk, Victor A. Padilha, Danilo S. Sanches, Andre C.P.L.F. de Carvalho, Rolf Backofen
In: Nucleic Acids Research, 2021
CRISPR-Cas systems are adaptive immune systemsin prokaryotes, providing resistance against invading viruses and plasmids. The identification of CRISPR loci is currently a non-standardized, ambiguous process, requiring the manual combination of multiple tools, where existing tools detect only parts of the CRISPR-systems, and lack quality control, annotation and assessment capabilities of the detected CRISPR loci. OurCRISPRloci server provides the first resource for the prediction and assessment of all possible CRISPR loci. The server integrates a series of advanced Machine Learning tools within a seamless web interface featuring: (i) prediction of all CRISPR arrays in the correct orientation; (ii) definition of CRISPR leaders for each locus; and (iii) annotation of cas genes and their unambiguous classification. As a result, CRISPRloci is able to accurately determine the CRISPR array and associated information, such as: the Cas subtypes; cassette boundaries; accuracy of the repeat structure, orientation and leader sequence; virus-host interactions; self-targeting; as well as the annotation of cas genes, all of which have been missing from existing tools. This annotation is presented in an interactive interface, making it easy for scientists to gain an overview of the CRISPR system in their organism of interest. Predictions are also rendered in GFF format, enabling in-depth genome browser inspection. In summary, CRISPRloci constitutes a full suite for CRISPR-Cas system characterization that offers annotation quality previously available only after manual inspection.
Alexander Mitrofanov, Omer S. Alkhnbashi, Sergey A. Shmakov, Kira S. Makarova, Eugene V. Koonin, Rolf Backofen
In: Nucleic Acids Res, 2020
CRISPR-Cas are adaptive immune systems that degrade foreign genetic elements in archaea and bacteria. In carrying out their immune functions, CRISPR-Cas systems heavily rely on RNA components. These CRISPR (cr) RNAs are repeat-spacer units that are produced by processing of pre-crRNA, the transcript of CRISPR arrays, and guide Cas protein(s) to the cognate invading nucleic acids, enabling their destruction. Several bioinformatics tools have been developed to detect CRISPR arrays based solely on DNA sequences, but all these tools employ the same strategy of looking for repetitive patterns, which might correspond to CRISPR array repeats. The identified patterns are evaluated using a fixed, built-in scoring function, and arrays exceeding a cut-off value are reported. Here, we instead introduce a data-driven approach that uses machine learning to detect and differentiate true CRISPR arrays from false ones based on several features. Our CRISPR detection tool, CRISPRidentify, performs three steps: detection, feature extraction and classification based on manually curated sets of positive and negative examples of CRISPR arrays. The identified CRISPR arrays are then reported to the user accompanied by detailed annotation. We demonstrate that our approach identifies not only previously detected CRISPR arrays, but also CRISPR array candidates not detected by other tools. Compared to other methods, our tool has a drastically reduced false positive rate. In contrast to the existing tools, our approach not only provides the user with the basic statistics on the identified CRISPR arrays but also produces a certainty score as a practical measure of the likelihood that a given genomic region is a CRISPR array.
Victor A Padilha, Omer S Alkhnbashi, Van Dinh Tran, Shiraz A Shah, André C P L F Carvalho, Rolf Backofen
In: Bioinformatics, 11 2020
CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette boundaries is key for finding cassettes in CRISPR research field. This is often carried out by using Hidden Markov Models and manual annotation. In this article, we propose the first method able to automatically define the cassette boundaries. In addition, we present a Cas-type predictive model used by the method to assign each gene located in the region defined by a cassette’s boundaries a Cas label from a set of pre-defined Cas types. Furthermore, the proposed method can detect potentially new cas genes and decompose a cassette into its modules.We evaluate the predictive performance of our proposed method on data collected from the two most recent CRISPR classification studies. In our experiments, we obtain an average similarity of 0.86 between the predicted and expected cassettes. Besides, we achieve F-scores above 0.9 for the classification of cas genes of known types and 0.73 for the unknown ones. Finally, we conduct two additional study cases, where we investigate the occurrence of potentially new cas genes and the occurrence of module exchange between different genomes.https://github.com/BackofenLab/Casboundary.alkhanbo@informatik.uni-f% reiburg.de or backofen@informatik.uni-freiburg.deSupplementary data are available at Bioinformatics online.
Aris-Edda Stachler, Julia Wortz, Omer S. Alkhnbashi, Israela Turgeman-Grott, Rachel Smith, Thorsten Allers, Rolf Backofen, Uri Gophna, Anita Marchfelder
In: Journal of Biological Chemistry, 2020, 295(39), 13502-13515
Haloferax volcanii is, to our knowledge, the only prokaryote known to tolerate CRISPR-Cas-mediated damage to its genome in the WT background; the resulting cleavage of the genome is repaired by homologous recombination restoring the WT version. In mutant Haloferax strains with enhanced self-targeting, cell fitness decreases and microhomology-mediated end joining becomes active, generating deletions in the targeted gene. Here we use self-targeting to investigate adaptation in H. volcanii CRISPR-Cas type I-B. We show that self-targeting and genome breakage events that are induced by self-targeting, such as those catalyzed by active transposases, can generate DNA fragments that are used by the CRISPR-Cas adaptation machinery for integration into the CRISPR loci. Low cellular concentrations of self-targeting crRNAs resulted in acquisition of large numbers of spacers originating from the entire genomic DNA. In contrast, high concentrations of self-targeting crRNAs resulted in lower acquisition that was mostly centered on the targeting site. Furthermore, we observed naive spacer acquisition at a low level in WT Haloferax cells and with higher efficiency upon overexpression of the Cas proteins Cas1, Cas2, and Cas4. Taken together, these findings indicate that naive adaptation is a regulated process in H. volcanii that operates at low basal levels and is induced by DNA breaks.
Victor A. Padilha, Omer S. Alkhnbashi, Shiraz A. Shah, Andre C. P. L. F. de Carvalho, Rolf Backofen
In: Gigascience, 2020, 9(6)
BACKGROUND: CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems and for finding new candidates for genome engineering in eukaryotic models. RESULTS: We introduce CRISPRcasIdentifier, a new machine learning-based tool that combines regression and classification models for the prediction of potentially missing proteins in instances of CRISPR-Cas systems and the prediction of their respective subtypes. In contrast to other available tools, CRISPRcasIdentifier can both detect cas genes and extract potential association rules that reveal functional modules for CRISPR-Cas systems. In our experimental benchmark on the most recently published and comprehensive CRISPR-Cas system dataset, CRISPRcasIdentifier was compared with recent and state-of-the-art tools. According to the experimental results, CRISPRcasIdentifier presented the best Cas protein identification and subtype classification performance. CONCLUSIONS: Overall, our tool greatly extends the classification of CRISPR cassettes and, for the first time, predicts missing Cas proteins and association rules between Cas proteins. Additionally, we investigated the properties of CRISPR subtypes. The proposed tool relies not only on the knowledge of manual CRISPR annotation but also on models trained using machine learning.
Kira S. Makarova, Yuri I. Wolf, Jaime Iranzo, Sergey A. Shmakov, Omer S. Alkhnbashi, Stan J. J. Brouns, Emmanuelle Charpentier, David Cheng, Daniel H. Haft, Philippe Horvath, Sylvain Moineau, Francisco J. M. Mojica, David Scott, Shiraz A. Shah, Virginijus Siksnys, Michael P. Terns, Ceslovas Venclovas, Malcolm F. White, Alexander F. Yakunin, Winston Yan, Feng Zhang, Roger A. Garrett, Rolf Backofen, John van der Oost, Rodolphe Barrangou, Eugene V. Koonin
In: Nat Rev Microbiol, 2020, 18(2), 67-83
The number and diversity of known CRISPR-Cas systems have substantially increased in recent years. Here, we provide an updated evolutionary classification of CRISPR-Cas systems and cas genes, with an emphasis on the major developments that have occurred since the publication of the latest classification, in 2015. The new classification includes 2 classes, 6 types and 33 subtypes, compared with 5 types and 16 subtypes in 2015. A key development is the ongoing discovery of multiple, novel class 2 CRISPR-Cas systems, which now include 3 types and 17 subtypes. A second major novelty is the discovery of numerous derived CRISPR-Cas variants, often associated with mobile genetic elements that lack the nucleases required for interference. Some of these variants are involved in RNA-guided transposition, whereas others are predicted to perform functions distinct from adaptive immunity that remain to be characterized experimentally. The third highlight is the discovery of numerous families of ancillary CRISPR-linked genes, often implicated in signal transduction. Together, these findings substantially clarify the functional diversity and evolutionary history of CRISPR-Cas.
Omer S. Alkhnbashi, Tobias Meier, Alexander Mitrofanov, Rolf Backofen, Bjorn Voss
In: Methods, 2019
Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated proteins (Cas) are essential genetic elements in many archaeal and bacterial genomes, playing a key role in a prokaryote adaptive immune system against invasive foreign elements. In recent years, the CRISPR-Cas system has also been engineered to facilitate target gene editing in eukaryotic genomes. Bioinformatics played an essential role in the detection and analysis of CRISPR systems and here we review the bioinformatics-based efforts that pushed the field of CRISPR-Cas research further. We discuss the bioinformatics tools that have been published over the last few years and, finally, present the most popular tools for the design of CRISPR-Cas9 guides.
Lisa Nickel, Andrea Ulbricht, Omer S. Alkhnbashi, Konrad U. Forstner, Liam Cassidy, Katrin Weidenbach, Rolf Backofen, Ruth A. Schmitz
In: RNA Biol, 2019, 16(4), 492-503
The clustered regularly interspaced short palindromic repeat (CRISPR) system is a prokaryotic adaptive defense system against foreign nucleic acids. In the methanoarchaeon Methanosarcina mazei Go1, two types of CRISPR-Cas systems are present (type I-B and type III-C). Both loci encode a Cas6 endonuclease, Cas6b-IB and Cas6b-IIIC, typically responsible for maturation of functional short CRISPR RNAs (crRNAs). To evaluate potential cross cleavage activity, we biochemically characterized both Cas6b proteins regarding their crRNA binding behavior and their ability to process pre-crRNA from the respective CRISPR array in vivo. Maturation of crRNA was studied in the respective single deletion mutants by northern blot and RNA-Seq analysis demonstrating that in vivo primarily Cas6b-IB is responsible for crRNA processing of both CRISPR arrays. Tentative protein level evidence for the translation of both Cas6b proteins under standard growth conditions was detected, arguing for different activities or a potential non-redundant role of Cas6b-IIIC within the cell. Conservation of both Cas6 endonucleases was observed in several other M. mazei isolates, though a wide variety was displayed. In general, repeat and leader sequence conservation revealed a close correlation in the M. mazei strains. The repeat sequences from both CRISPR arrays from M. mazei Go1 contain the same sequence motif with differences only in two nucleotides. These data stand in contrast to all other analyzed M. mazei isolates, which have at least one additional CRISPR array with repeats belonging to another sequence motif. This conforms to the finding that Cas6b-IB is the crucial and functional endonuclease in M. mazei Go1. Abbreviations: sRNA: small RNA; crRNA: CRISPR RNA; pre-crRNAs: Precursor CRISPR RNA; CRISPR: clustered regularly interspaced short palindromic repeats; Cas: CRISPR associated; nt: nucleotide; RNP: ribonucleoprotein; RBS: ribosome binding site.
Shengwei Hou, Manuel Brenes-Alvarez, Viktoria Reimann, Omer S. Alkhnbashi, Rolf Backofen, Alicia M. Muro-Pastor, Wolfgang R. Hess
In: RNA Biol, 2019, 16(4), 518-529
Novel CRISPR-Cas systems possess substantial potential for genome editing and manipulation of gene expression. The types and numbers of CRISPR-Cas systems vary substantially between different organisms. Some filamentous cyanobacteria harbor > 40 different putative CRISPR repeat-spacer cassettes, while the number of cas gene instances is much lower. Here we addressed the types and diversity of CRISPR-Cas systems and of CRISPR-like repeat-spacer arrays in 171 publicly available genomes of multicellular cyanobacteria. The number of 1328 repeat-spacer arrays exceeded the total of 391 encoded Cas1 proteins suggesting a tendency for fragmentation or the involvement of alternative adaptation factors. The model cyanobacterium Anabaena sp. PCC 7120 contains only three cas1 genes but hosts three Class 1, possibly one Class 2 and five orphan repeat-spacer arrays, all of which exhibit crRNA-typical expression patterns suggesting active transcription, maturation and incorporation into CRISPR complexes. The CRISPR-Cas system within the element interrupting the Anabaena sp. PCC 7120 fdxN gene, as well as analogous arrangements in other strains, occupy the genetic elements that become excised during the differentiation-related programmed site-specific recombination. This fact indicates the propensity of these elements for the integration of CRISPR-cas systems and points to a previously not recognized connection. The gene all3613 resembling a possible Class 2 effector protein is linked to a short repeat-spacer array and a single tRNA gene, similar to its homologs in other cyanobacteria. The diversity and presence of numerous CRISPR-Cas systems in DNA elements that are programmed for homologous recombination make filamentous cyanobacteria a prolific resource for their study. Abbreviations: Cas: CRISPR associated sequences; CRISPR: Clustered Regularly Interspaced Short Palindromic Repeats; C2c: Class 2 candidate; SDR: small dispersed repeat; TSS: transcriptional start site; UTR: untranslated region.
Shiraz A. Shah, Omer S. Alkhnbashi, Juliane Behler, Wenyuan Han, Qunxin She, Wolfgang R. Hess, Roger A. Garrett, Rolf Backofen
In: RNA Biol, 2019, 16(4), 530-542
A study was undertaken to identify conserved proteins that are encoded adjacent to cas gene cassettes of Type III CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats - CRISPR associated) interference modules. Type III modules have been shown to target and degrade dsDNA, ssDNA and ssRNA and are frequently intertwined with cofunctional accessory genes, including genes encoding CRISPR-associated Rossman Fold (CARF) domains. Using a comparative genomics approach, and defining a Type III association score accounting for coevolution and specificity of flanking genes, we identified and classified 39 new Type III associated gene families. Most archaeal and bacterial Type III modules were seen to be flanked by several accessory genes, around half of which did not encode CARF domains and remain of unknown function. Northern blotting and interference assays in Synechocystis confirmed that one particular non-CARF accessory protein family was involved in crRNA maturation. Non-CARF accessory genes were generally diverse, encoding nuclease, helicase, protease, ATPase, transporter and transmembrane domains with some encoding no known domains. We infer that additional families of non-CARF accessory proteins remain to be found. The method employed is scalable for potential application to metagenomic data once automated pipelines for annotation of CRISPR-Cas systems have been developed. All accessory genes found in this study are presented online in a readily accessible and searchable format for researchers to audit their model organism of choice: http://accessory.crispr.dk .
Lisa-Katharina Maier, Aris-Edda Stachler, Jutta Brendel, Britta Stoll, Susan Fischer, Karina A. Haas, Thandi S. Schwarz, Omer S. Alkhnbashi, Kundan Sharma, Henning Urlaub, Rolf Backofen, Uri Gophna, Anita Marchfelder
In: RNA Biol, 2019, 16(4), 469-480
Invading genetic elements pose a constant threat to prokaryotic survival, requiring an effective defence. Eleven years ago, the arsenal of known defence mechanisms was expanded by the discovery of the CRISPR-Cas system. Although CRISPR-Cas is present in the majority of archaea, research often focuses on bacterial models. Here, we provide a perspective based on insights gained studying CRISPR-Cas system I-B of the archaeon Haloferax volcanii. The system relies on more than 50 different crRNAs, whose stability and maintenance critically depend on the proteins Cas5 and Cas7, which bind the crRNA and form the Cascade complex. The interference machinery requires a seed sequence and can interact with multiple PAM sequences. H. volcanii stands out as the first example of an organism that can tolerate autoimmunity via the CRISPR-Cas system while maintaining a constitutively active system. In addition, the H. volcanii system was successfully developed into a tool for gene regulation.
Martin Raden, Syed M Ali, Omer S Alkhnbashi, Anke Busch, Fabrizio Costa, Jason A Davis, Florian Eggenhofer, Rick Gelhausen, Jens Georg, Steffen Heyne, Michael Hiller, Kousik Kundu, Robert Kleinkauf, Steffen C Lott, Mostafa M Mohamed, Alexander Mattheis, Milad Miladi, Andreas S Richter, Sebastian Will, Joachim Wolff, Patrick R Wright, Rolf Backofen
In: Nucleic Acids Research, 2018, 46(W1), W25-W29
The Freiburg RNA tools webserver is a well established online resource for RNA-focused research. It provides a unified user interface and comprehensive result visualization for efficient command line tools. The webserver includes RNA-RNA interaction prediction (IntaRNA, CopraRNA, metaMIR), sRNA homology search (GLASSgo), sequence-structure alignments (LocARNA, MARNA, CARNA, ExpaRNA), CRISPR repeat classification (CRISPRmap), sequence design (antaRNA, INFO-RNA, SECISDesign), structure aberration evaluation of point mutations (RaSE), and RNA/protein-family models visualization (CMV), and other methods. Open education resources offer interactive visualizations of RNA structure and RNA-RNA interaction prediction as well as basic and advanced sequence alignment algorithms. The services are freely available at http://rna.informatik.uni-freiburg.de.
Katrin Weidenbach, Lisa Nickel, Horst Neve, Omer S. Alkhnbashi, Sven Kunzel, Anne Kupczok, Thorsten Bauersachs, Liam Cassidy, Andreas Tholey, Rolf Backofen, Ruth A. Schmitz
In: J Virol, 2017
A novel archaeal lytic virus targeting species of the genus Methanosarcina was isolated using Methanosarcina mazei strain Go1 as host. Due to its spherical morphology the virus was designated Methanosarcinaspherical virus (MetSV). Molecular analysis demonstrated that MetSV contains double stranded linear DNA with a genome size of 10,567 bp containing 22 open reading frames (ORFs) all oriented in the same direction. Functions were predicted for some of these ORFs, i. e. like DNA polymerase, ATPase, DNA-binding protein, as well as envelope (structural) protein. MetSV-derived spacers in CRISPR loci were detected in several published Methanosarcina draft genomes using bioinformatic tools, revealing the potential PAM motif (TTA/T). Transcription and expression of several predicted viral ORFs were validated by RT-PCR, PAGE analysis and LC-MS based proteomics. Analysis of core lipids by APCI mass spectrometry showed that MetSV and M. mazei both contain archaeol and glycerol dialkyl glycerol tetraether without cyclopentane moiety (GDGT-0). The MetSV host range is limited to Methanosarcina strains growing as single cells (M. mazei, M. bakeri and M. soligelidi). In contrast, strains growing as sarcina-like aggregates were apparently protected from infection. Heterogeneity related to morphology phases in M. mazei cultures allowed acquisition of resistance to MetSV after challenge by growing as sarcina-like aggregates. CRISPR/Cas mediated resistance was excluded since neither of the two CRISPR arrays showed MetSV-derived spacer acquisition. Based on these findings, we propose that changing the morphology from single cells to sarcina-like aggregates upon rearrangement of the envelope structure prevents infection and subsequent lysis by MetSV.IMPORTANCE Methanoarchaea are among the most abundant organisms on the planet since they are present in high numbers in major anaerobic environments. They convert various carbon sources e.g. acetate, methylamines or methanol to methane and carbon dioxide, thus they have a significant impact on the emission of major greenhouse gases. Today very little is known about viruses specifically infecting methanoarchaea, which most probably impact the abundance of methanoarchaea in microbial consortia. Here we characterize the first identified Methanosarcina infecting virus (MetSV) and show a mechanism for acquiring resistance against MetSV. Based on our results we propose that growth as sarcina-like aggregates prevents infection and subsequent lysis. These findings allow new insights into virus-host relationship in methanogenic community structures, their dynamics and their phase heterogeneity. Moreover, the availability of a specific virus provides new possibilities to deepen our knowledge on defence mechanisms of potential hosts and offers tools for genetic manipulation.
Lisa-Katharina Maier, Omer S. Alkhnbashi, Rolf Backofen, Anita Marchfelder
In: Béatrice Clouet-d'Orval, RNA Metabolism and Gene Expression in Archaea, 2017, 243--269
CRISPR-Cas (CRISPR: Clustered Regularly Interspaced Short Palindromic Repeats and Cas: CRISPR associated) systems are unique defence mechanisms since they are able to adapt to new invaders and are heritable. CRISPR-Cas systems facilitate the sequence-specific elimination of invading genetic elements in prokaryotes, they are found in 45\% of bacteria and 85\% of archaea. Their general features have been studied in detail, but subtype- and species-specific variations await investigation. Haloarchaea is one of few archaeal classes in which CRISPR-Cas systems have been investigated in more than one genus. Here, we summarize the available information on CRISPR-Cas defence in three Haloarchaea: Haloferax volcanii, Haloferax mediterranei and Haloarcula hispanica. Haloarchaea share type I CRISPR-Cas systems, with subtype I-B being dominant. Type I-B systems rely on Cas proteins Cas5, Cas7, and Cas8b for the interference reaction and these proteins have been shown to form a Cascade (CRISPR-associated complex for antiviral defence) -like complex in Hfx (Haloferax). volcanii. Cas6b is the endonuclease for crRNA (CRISPR RNA) maturation in type I-B systems but the protein is dispensable for interference in Hfx. volcanii. Haloarchaea share a common repeat sequence and crRNA-processing pattern. A prerequisite for successful invader recognition in Hfx. volcanii is base pairing over a ten-nucleotide-long non-contiguous seed sequence. Moreover, Hfx. volcanii and Har (Haloarcula). hispanica rely each on certain specific PAM (protospacer adjacent motif) sequences to elicit interference, but they share only one PAM sequence. Primed adaptation in Har. hispanica relies on another set of PAM sequences.
Vanessa Tripp, Roman Martin, Alvaro Orell, Omer S. Alkhnbashi, Rolf Backofen, Lennart Randau
In: Mol Microbiol, 2016, 103(1), 151-164
Archaeal and eukaryotic organisms contain sets of C/D box s(no)RNAs with guide sequences that determine ribose 2'-O-methylation sites of target RNAs. The composition of these C/D box sRNA sets is highly variable between organisms and results in varying RNA modification patterns which are important for ribosomal RNA folding and stability. Little is known about the genomic organization of C/D box sRNA genes in archaea. Here, we aimed to obtain first insights into the biogenesis of these archaeal C/D box sRNAs and analyzed the genetic context of more than 300 archaeal sRNA genes. We found that the majority of these genes do not possess independent promoters but are rather located at positions that allow for co-transcription with neighboring genes and their start or stop codons were frequently incorporated into the conserved boxC and D motifs. The biogenesis of plasmid-encoded C/D box sRNA variants was analyzed in vivo in Sulfolobus acidocaldarius. It was found that C/D box sRNA maturation occurs independent of their genetic context and relies solely on the presence of intact RNA kink-turn structures. The observed plasticity of C/D box sRNA biogenesis is suggested to enable their accelerated evolution and, consequently, allow for adjustments of the RNA modification landscape. This article is protected by copyright. All rights reserved.
Viktoria Reimann, Omer S. Alkhnbashi, Sita J. Saunders, Ingeborg Scholz, Stephanie Hein, Rolf Backofen, Wolfgang R. Hess
In: Nucleic Acids Res, 2016
A hallmark of defense mechanisms based on clustered regularly interspaced short palindromic repeats (CRISPR) and associated sequences (Cas) are the crRNAs that guide these complexes in the destruction of invading DNA or RNA. Three separate CRISPR-Cas systems exist in the cyanobacterium Synechocystis sp. PCC 6803. Based on genetic and transcriptomic evidence, two associated endoribonucleases, Cas6-1 and Cas6-2a, were postulated to be involved in crRNA maturation from CRISPR1 or CRISPR2, respectively. Here, we report a promiscuity of both enzymes to process in vitro not only their cognate transcripts, but also the respective non-cognate precursors, whereas they are specific in vivo Moreover, while most of the repeats serving as substrates were cleaved in vitro, some were not. RNA structure predictions suggested that the context sequence surrounding a repeat can interfere with its stable folding. Indeed, structure accuracy calculations of the hairpin motifs within the repeat sequences explained the majority of analyzed cleavage reactions, making this a good measure for predicting successful cleavage events. We conclude that the cleavage of CRISPR1 and CRISPR2 repeat instances requires a stable formation of the characteristic hairpin motif, which is similar between the two types of repeats. The influence of surrounding sequences might partially explain variations in crRNA abundances and should be considered when designing artificial CRISPR arrays.
Omer S. Alkhnbashi, Shiraz A. Shah, Roger A. Garrett, Sita J. Saunders, Fabrizio Costa, Rolf Backofen
In: Bioinformatics, 2016, 32(17), i576-i585
MOTIVATION: The CRISPR-Cas system is an adaptive immune system in many archaea and bacteria, which provides resistance against invading genetic elements. The first phase of CRISPR-Cas immunity is called adaptation, in which small DNA fragments are excised from genetic elements and are inserted into a CRISPR array generally adjacent to its so called leader sequence at one end of the array. It has been shown that transcription initiation and adaptation signals of the CRISPR array are located within the leader. However, apart from promoters, there is very little knowledge of sequence or structural motifs or their possible functions. Leader properties have mainly been characterized through transcriptional initiation data from single organisms but large-scale characterization of leaders has remained challenging due to their low level of sequence conservation. RESULTS: We developed a method to successfully detect leader sequences by focusing on the consensus repeat of the adjacent CRISPR array and weak upstream conservation signals. We applied our tool to the analysis of a comprehensive genomic database and identified several characteristic properties of leader sequences specific to archaea and bacteria, ranging from distinctive sizes to preferential indel localization. CRISPRleader provides a full annotation of the CRISPR array, its strand orientation as well as conserved core leader boundaries that can be uploaded to any genome browser. In addition, it outputs reader-friendly HTML pages for conserved leader clusters from our database. AVAILABILITY AND IMPLEMENTATION: CRISPRleader and multiple sequence alignments for all 195 leader clusters are available at http://www.bioinf.uni-freiburg.de/Software/CRISPRleader/ CONTACT: costa@informatik.uni-freiburg.de or backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Kira S. Makarova, Yuri I. Wolf, Omer S. Alkhnbashi, Fabrizio Costa, Shiraz A. Shah, Sita J. Saunders, Rodolphe Barrangou, Stan J. J. Brouns, Emmanuelle Charpentier, Daniel H. Haft, Philippe Horvath, Sylvain Moineau, Francisco J. M. Mojica, Rebecca M. Terns, Michael P. Terns, Malcolm F. White, Alexander F. Yakunin, Roger A. Garrett, John van der Oost, Rolf Backofen, Eugene V. Koonin
In: Nat Rev Microbiol, 2015, 13(11), 722-736
The evolution of CRISPR-cas loci, which encode adaptive immune systems in archaea and bacteria, involves rapid changes, in particular numerous rearrangements of the locus architecture and horizontal transfer of complete loci or individual modules. These dynamics complicate straightforward phylogenetic classification, but here we present an approach combining the analysis of signature protein families and features of the architecture of cas loci that unambiguously partitions most CRISPR-cas loci into distinct classes, types and subtypes. The new classification retains the overall structure of the previous version but is expanded to now encompass two classes, five types and 16 subtypes. The relative stability of the classification suggests that the most prevalent variants of CRISPR-Cas systems are already known. However, the existence of rare, currently unclassifiable variants implies that additional types and subtypes remain to be characterized.
Simon D. B. Cass, Karina A. Haas, Britta Stoll, Omer Alkhnbashi, Kundan Sharma, Henning Urlaub, Rolf Backofen, Anita Marchfelder, Edward L. Bolt
In: Biosci Rep, 2015, 35(4), e00197
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) systems provide bacteria and archaea with adaptive immunity to repel invasive genetic elements. Type I systems use "Cascade" ribonucleoprotein complexes to target invader DNA, by base pairing CRISPR RNA (crRNA) to protospacers. Cascade identifies PAMs (Protospacer Adjacent Motifs) on invader DNA, triggering R-loop formation and subsequent DNA degradation by Cas3. Cas8 is a candidate PAM recognition factor in some Cascades. We analysed Cas8 homologues from type IB CRISPR systems in archaea Haloferax volcanii (Hvo) and Methanothermobacter thermautotrophicus (Mth). Cas8 was essential for CRISPR interference in Hvo, and purified Mth Cas8 protein responded to PAM sequence when binding to nucleic acids. Cas8 interacted physically with Cas5-Cas7-crRNA complex, stimulating binding to PAM containing substrates. Mutation of conserved Cas8 amino acid residues abolished interference in vivo, and altered catalytic activity of Cas8 protein in vitro. This is experimental evidence that Cas8 is important for targeting Cascade to invader DNA.
Omer S. Alkhnbashi, Fabrizio Costa, Shiraz A. Shah, Roger A. Garrett, Sita J. Saunders, Rolf Backofen
In: Bioinformatics, 2014, 30(17), i489-i496
MOTIVATION: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. RESULTS: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. AVAILABILITY: CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. CONTACT: backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Sita J. Lange, Omer S. Alkhnbashi, Dominic Rose, Sebastian Will, Rolf Backofen
In: Nucleic Acids Res, 2013, 41(17), 8034-44
Central to Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas systems are repeated RNA sequences that serve as Cas-protein-binding templates. Classification is based on the architectural composition of associated Cas proteins, considering repeat evolution is essential to complete the picture. We compiled the largest data set of CRISPRs to date, performed comprehensive, independent clustering analyses and identified a novel set of 40 conserved sequence families and 33 potential structure motifs for Cas-endoribonucleases with some distinct conservation patterns. Evolutionary relationships are presented as a hierarchical map of sequence and structure similarities for both a quick and detailed insight into the diversity of CRISPR-Cas systems. In a comparison with Cas-subtypes, I-C, I-E, I-F and type II were strongly coupled and the remaining type I and type III subtypes were loosely coupled to repeat and Cas1 evolution, respectively. Subtypes with a strong link to CRISPR evolution were almost exclusive to bacteria; nevertheless, we identified rare examples of potential horizontal transfer of I-C and I-E systems into archaeal organisms. Our easy-to-use web server provides an automated assignment of newly sequenced CRISPRs to our classification system and enables more informed choices on future hypotheses in CRISPR-Cas research: http://rna.informatik.uni-freiburg.de/CRISPRmap.