November 1, 2018
Wednesday, October 24, 2018
2:30 pm
North Campus Building (NCB)
Room: 114

Speaker: Lila Kari (Waterloo)

Title: Machine Learning and the Mathematics of Genomes

In the same way we use the twenty-six letters of the alphabet to write text, and the two bits 0 and 1 to write computer code, the four basic DNA units (Adenine, Cytosine, Guanine, Thymine) are used by Nature to encode information as DNA strands. Theoretically, a DNA strand can be viewed as a "word” over the four-letter alphabet {A, C, G, T}, and the mathematical structure of such words has implications for their biological structure and function.
This talk describes our research into the mathematical properties of genomic DNA sequences by exploring the connection between word frequencies in a genome and the type of organism that the genome belongs to. In particular, I describe our investigation into the Chaos Game Representation of a DNA sequence as a potential "genomic signature” of its species. Moreover, I describe how we combine supervised machine learning techniques with such genomic signatures for ultrafast, accurate, and scalable algorithms for species identication and classication. The potential impact of such alignment-free universal classication algorithms could be signicant, given that 86% of existing species on Earth and 91% of species in the oceans still await classication.

All are welcome. Coffee and cookies will be served.

Rick Jardine
