Protein Structure Prediction

Protein Structure Prediction Using Physical-Based Global Optimization and

Knowledge-Guided Fragment Packing


We describe a new method that predicts the tertiary structure of “new-fold” proteins. The method is one of the few attempts to use an all-atom physics-based energy function throughout all stages of the optimization. Nevertheless, it uses some knowledge from known proteins to guide the search through the vast conformational space. The method is composed of two phases. Phase I creates a variety of initial configurations by incorporating knowledge from known proteins in two ways: (1) using secondary structure predictions and structural templates of known proteins(Ginalski, et al., 2003), and (2) using probability results on both protein fold topology(Ruczinski, et al., 2002) and sequence matching specificity(Zhu & Braun, 1999) to automatically produce a high probability collection of possible sheet conformations. In phase II, the initial configurations are improved by applying a sophisticated optimization algorithm that optimizes selected subspaces of the predicted coil regions in parallel. The optimization phase combines filtering techniques with a physics-based energy function to enhance the structure discrimination ability. We tested our method in CASP6 and it produced the best prediction on one of the “new-fold” targets – T238.

Method Description

Phase I: knowledge-guided fragment packing
Phase II: physical-based global optimization

Phase I: Setup Initial Models

• Fragments from "scratch"

• Fragments packing

Phase II: Global Optimization

*Energy function: an all-atom physics-based energy function AMBER

CASP6 Test

The GDT plot of T238 of CASP6. Blue -- the first model submitted by our group. Cyan -- other models submitted by our group. Brown -- the models submitted by all other groups(119 groups submitted 434 models for T238). The y axis represents a Ca RMSD cutoff under which to fit the model to the native structure, and the x axis represents the percentage of the model that will fit below that cutoff value. This figure is from CASP6 website at http://predictioncenter.llnl.gov/casp6/Casp6.html