Modelling Gene Structure with Deep Neural Networks
Deep Learning has achieved extraordinary performance in many computational fields such as image recognition, language translation and strategy games. Here, we try and apply this promising tool to better understand biological sequence data. Currently we’re focusing on developing highly generalizable models to predict gene structure (e.g. what parts of the genomic sequence become introns and exons). With these models we hope to 1) easily annotate genes in newly sequenced species, and 2) improve the quality and consistency of existing gene annotations, and thereby facilitate comparative genomics. Accomplishing these goals requires a variety of tasks starting from the collection, examination, and quality control of input genome, annotation, and supporting data. The model architecture is coded; the curated data is then used to train the models; and finally the models are iteratively evaluated and optimized. We want to make an accurate prediction, and more-over to reverse engineer the models and understand their predictors in order to improve our understanding of the underlying biology. In the long run, we hope to expand our work to support endeavours in synthetic biology, for instance by predicting gene expression or even how genomic changes will affect the phenotype.