CSE Speaker Series – Dr. Trilce Estrada

On Friday, November 3 at 11:00 am in Cramer 221, Dr. Trilce Estrada of UNM will give a talk on “Graphic Encoding of Macromolecules for Efficient High-Throughput Analysis”.

Abstract

The function of a protein depends on its three-dimensional structure. Computational approaches for protein function prediction, and more generally macromolecular analysis are limited by the expressiveness and complexity of protein representation formats. Partial structural representations and representations that rely on homology alignments are both computationally expensive and do not scale with the number of molecules, as three-dimensional matching is an NP-hard problem. Being able to represent heterogeneous macromolecules in a homogeneous, easy-to-compare, and easy-to-analyze format has the potential to disrupt the way and scale at which molecular analysis is done today. In this talk I will introduce a generalizable and homogeneous representation of macromolecules that explicitly encodes tertiary structural motifs and their relative distance as a proxy of their interaction. The final goal of this encoding is to expose intra- and inter-molecular structural patterns in a scalable way, i.e., that does not require performing alignments, homology calculations, or other expensive operations between pairs, or sets, of proteins. To demonstrate the effectiveness of this encoding, we also present an image processing system based on deep convolutional neural networks that are able to use our graphic representation to perform high throughput protein function prediction.