Blog: RECOMB 2019 Day Two Highlights: Proteins and Posters
Today is the second day of RECOMB 2019, a conference focusing on computational molecular biology. There were two really interesting presentations today that show just how far deep learning has come in one of the hardest research areas in computational biology: proteins.
Proteins are one of the major types of molecules that makes up living organisms. They do an incredible diversity of things, from enabling the chemical reactions that sustain life to providing structure and support at the molecular level. Today’s talks focused on two different challenges in computational protein science: identifying ligands and predicting protein structure.
Imagine this challenge: given a lock, identify which keys will unlock it just from looking. This is basically the challenge of protein binding prediction. Proteins have unique 3D shapes and are able to attach to other, smaller signaling molecules called ligands to control biological processes. In the lock analogy, a protein is the lock and a ligand is the key. So, given a protein, how can you figure out which ligands bind to it? That’s the question that this paper and talk by Yunan Luo et al. seeks to answer through model agnostic meta-learning. One challenge is that the data is not evenly distributed. Some families have a lot of data available regarding which ligands its members bind. Others, not so much. To work around that, the researchers trained their model using protein families whose binding is well understood and then fine-tuned the model using few-shot learning. They used model agnostic meta-learning to make their pre-trained model as transferable as possible.
The other highlight of the proteins section was on protein structure prediction. Protein structure prediction is one of the holy grails of computational biology. The basic idea is that you’re given a sequence of amino acids (the molecular building blocks of proteins) and your task is to figure out how the molecules will fold into a 3D shape. As you might have heard on the news, DeepMind’s AlphaFold recently achieved state of the art performance in a protein folding prediction competition held every two years. Today’s talk by Jinbo Xu was about a competitor’s very similar approach (also based on deep learning). This approach is interesting because it is able to predict folding without simulating it. Simulation is hard problem (see yesterday’s post for more on simulation), so approaches that are able to skip that step can be done much more computationally feasibly. The method also doesn’t rely on comparing a given sequence to known sequences, making it capable of working on sequences unlike anything anyone has seen before. I really enjoyed getting to learn about how many different approaches there are to solving this important problem and am eagerly looking forward to seeing what happens in two years at the next competition. Either way, it looks like deep learning is here to stay in protein folding prediction.
At the end of the day, the conference had a poster session, in which we presented our research on alignment-free DNA sequence analysis (blog post coming soon). I also got to meet some of the Deep Rules contributors with whom I’ve been working virtually.
As for tomorrow, the day is starting off with a keynote on data science for precision medicine, followed by a session on genomic privacy. I can’t wait!