News & Events
Aligning NGS Readsets Against Incomplete Targets - Meeting the Challenges
Speaker(s): Dr Stuart Stephen, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
When: 01 June 2017 (2:00 - 3:00pm)
Where: SBS CR2 (Level 1)
Type: Seminars


The majority of bioinformatics workflows have alignments of Next-Generation Sequencing (NGS) readsets against a target as one of the initial steps in the workflow. Each workflow step has an associated error rate which compounds error rates arising during the previous step processing. Unless the overall workflow error rates are minimised, it is likely that the final result set will have little biological relevance. In alignments of NGS datasets there is no ground truth by which alignment error rates can be reliably quantified. This is further compounded when the target is large and polyploid (Wheat has 3 sub-genomes with estimated total haploid size around 17Gbp), especially when the target assembly is incomplete and of survey quality only. However, alignment practices can be adopted which are likely to improve the confidence – implying lower error rates – in alignments when aligning against these types of targets. This seminar presentation outlines how to use informative characterisations of both the NGS readsets and alignment target to improve alignment rates whilst maintaining or increasing confidence in these alignments.


Dr Stuart Stephen was awarded a PhD in the field of Molecular Bioscience at the University of Queensland in 2008, after a previous career spanning many decades in commercial software engineering. He studied for his PhD whilst a member of Professor John Mattick’s group, where the group objectives were the elucidation of the many functional roles which ncRNA plays in biological systems. His primary contribution towards the Mattick Group objectives, and the basis of his thesis, was the mining of various datasets for evidence of ncRNA sequence conservation. Dr Stephen has since been employed at CSIRO in the Plant Bioinformatics Group tasked with responsibility for development and implementation of analytic algorithms by which sequencing datasets can be efficiently mined for the extraction of biologically meaningful result sets, especially when the targeted plant assemblies and transcriptomes are incomplete and only of survey quality. He has co-authored 15 peer reviewed papers accruing more than 3,000 citations.