Bioinformatics Logo

Ongoing Projects

University Freiburg Logo

Ongoing Projects


Deep Learning for Gene Discovery in Uncharted Genomes

In Progress
Supervisor(s):
Degree:
Bachelor's Project, Master's Project, Bachelor's Thesis, Master's Thesis
Prerequisites:
Python
Linux command line
Git(hub)
ideally some practical experience in working with deep learning models (PyTorch) and clusters, plus a keen interest in analysing biological data
Description:

Predicting gene regions in newly sequenced or yet poorly annotated genomes is a vital task in bioinformatics. This way, we can learn about relationships between organisms, related functions of genes and proteins, as well as discover new proteins, with possible implications in medicine and industry. Lately, various deep learning methods (often large language models) have been devised to deliver improved predictions. The goal of this project is to run and compare a selection of these methods, including fine-tuning and possibly additional pre-training of the models. A powerful GPU cluster is available to handle the computations. Extension to thesis possible and also desired.

Clustering SARS-CoV-2 spike protein sequences using autoencoder neural network

In Progress
Supervisor(s):
Student:
Dilpreet Singh
Degree:
Master's Project
Description:

The aim of this project is to create a low-dimensional representation of SARS-CoV-2 spike protein sequences using an autoencoder neural network. Then, the low dimensional representation of sequences should be clustered using popular clustering algorithms such as TSNE and UMAP to explore if the original differences in sequences belonging to different clades (categories of sequences) are also maintained in lower dimensions. Related reading

Learn and predict nucleotide evolution in SARS-COV2 sequences using generative adversarial neural network

In Progress
Supervisor(s):
Student:
Saiprasad Barke
Degree:
Master's Project
Description:

SARS-COV2 sequences mutate to multiple variants categorized into lineages and clades, some of which alter the pathogenicity of the virus making it more virulent. Using generative adversarial neural networks, artificial sequences can be generated using the knowledge of the evolution of SARS-COV2 sequences in the past. Ideally, the neural network should learn the ‘edit’ mechanism of the sequences that evolved in the past and should generate sequences based on the learned knowledge. The generated sequences should be compared with the true sequences to see how good the neural network performs.