Protein Structure Prediction by Protein Alignments

19 Oct 2015  ·  Jianzhu Ma ·

Proteins are the basic building blocks of life. They usually perform functions by folding to a particular structure. Understanding the folding process could help the researchers to understand the functions of proteins and could also help to develop supplemental proteins for people with deficiencies and gain more insight into diseases associated with troublesome folding proteins. Experimental methods are both expensive and time consuming. In this thesis I introduce a new machine learning based method to predict the protein structure. The new method improves the performance from two directions: creating accurate protein alignments and predicting accurate protein contacts. First, I present an alignment framework MRFalign which goes beyond state-of-the-art methods and uses Markov Random Fields to model a protein family and align two proteins by aligning two MRFs together. Compared to other methods, that can only model local-range residue correlation, MRFs can model long-range residue interactions and thus, encodes global information in a protein. Secondly, I present a Group Graphical Lasso method for contact prediction that integrates joint multi-family Evolutionary Coupling analysis and supervised learning to improve accuracy on proteins without many sequence homologs. Different from single-family EC analysis that uses residue co-evolution information in only the target protein family, our joint EC analysis uses residue co-evolution in both the target family and its related families, which may have divergent sequences but similar folds. Our method can also integrate supervised learning methods to further improve accuracy. We evaluate the performance of both methods including each of its components on large public benchmarks. Experiments show that our methods can achieve better accuracy than existing state-of-the-art methods under all the measurements on most of the protein classes.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here