PROJECT UGent-0bbdfd62-12f1-428d-a2be-0d5dbe676c0a

Source DB	nl
Institution	UGent
Code	0bbdfd62-12f1-428d-a2be-0d5dbe676c0a
Unit	16a28d37-6005-49e4-be1b-46b8c76f6640
Begin	10/1/2018
End	9/30/2022
title fr
title nl	Probabilistisch grafische modellen voor een accurate identificatie van sequentiëringfouten
title en	Accurate Identification of Sequencing Errors using Probabilistic Graphical Models
Description fr
Description nl	Veel toepassingen in de moleculaire biologie zijn afhankelijk van de analyse van de sequentiegegevens van de volgende generatie.De aanwezigheid van sequencing-fouten in onbewerkte sequentiegegevens daagt deze toepassingen echter uitom op juiste wijze onderscheid te maken tussen echt biologisch signaal en sequentiebepalingsruis. wij geloven dathuidige methodologie kan worden verbeterd. De onderzoeksvraag van dit voorstel is dus hoemaximaal gebruikmaken van alle informatie die aanwezig is in onbewerkte sequentiegegevens om reeksen te detecteren en te corrigerenfouten.We stellen een methodologie voor om sequentiefouten te identificeren door niet alleen naar elk individu te kijkenpositie (bijv. gebruik van leesberichtondersteuning, kwaliteitsscores) maar ook in de context waarin avermoedelijke sequencing-fout treedt op. Onbewerkte sequentiegegevens worden vaak weergegeven in een grafiekstructuurde Bruijn-grafiek genoemd. We zullen een grafiektheoretische eigenschap van deze de Bruijn-grafieken en gebruikenintegreer meerdere de Bruijn grafiekrepresentaties in een enkel kader om volledig gebruik te maken van deContextuele informatie.Deze aanvullende contextuele informatie zal resulteren in een zeer dimensionale dataset, maar we stellen dat vastprobabilistische grafische modellen zijn bij uitstek geschikt om hier op een statistisch verantwoorde manier mee om te gaan. We geloven dat onze methodologie verschillende bioinformatica-toepassingen zoals lezen zal verbeterencorrectie, genoomassemblage en bellen met varianten
Description en	Many applications in molecular biology rely on the analysis of next generation sequencing data.However, the presence of sequencing errors in raw sequencing data challenges these applicationsto properly discriminate between true biological signal and sequencing noise. We believe thatcurrent methodology can be improved. The research question of this proposal is thus how tomaximally exploit all information present in raw sequencing data to detect and correct sequencingerrors.We propose a methodology to identify sequencing errors by not only looking at each individualposition (e.g. using read coverage support, quality scores) but also at the context in which aputative sequencing error occurs. Raw sequencing data is often represented in a graph structurecalled de Bruijn graph. We will exploit a graph-theoretical property of these de Bruijn graphs andintegrate multiple de Bruijn graph representations in a single framework to make full use of thecontextual information.This additional contextual information will result in a highly dimensional dataset, but we posit thatprobabilistic graphical models are ideally suited to deal with this in a statistically sound manner. We believe our methodology will improve various bioinformatics applications such as readcorrection, genome assembly and variant calling
Qualifiers	- Sequencing Errors - sequentiëringfouten -
Personal	Fostier Jan, Steyaert Aranka, Audenaert Pieter
Collaborations