One of the grand challenges is the protein structure prediction problem. The problem consists of determining tertiary structure of a protein given its primary sequence of amino acids. Experimental approaches, like X-ray crystallography and NMR spectroscopy, are very time consuming. Computer simulations that predict the protein fold are a promising alternative. Computational techniques that aim to solve tertiary structure conformation will not only tackle the folding of existing proteins but also the folding of "engineered" proteins, including those designed for drug purposes. Two kinds of methods have been identified to tackle the protein structure prediction problem: knowledge-based methods that rely on the presence of homologous proteins (in sequence or structure) in the databases, and physics-based methods that emphasize more the physical principles. Although the physics-based methods are extremely important to find genuine new folds, they are also extremely expensive. Thus, knowledge-based methods are a valid alternative when there is some homology. However, because in most cases only fragments of the proteins are similar, the results achieved by these methods may be limited.
During our first-year LDRD effort, we have designed a software infrastructure for improving an existing physics-based protein structure prediction method developed by a group of researchers in LBNL and the University of Colorado. The focus of the second year effort is to enhance and expand this tool so that it allows for the integration of knowledge-based approaches into the physics-based method with the goals of further reducing the computational time, improving the quality of the predictions, and tackling realistic-sized proteins with complex folds. The software tool generated by this LDRD projects, called ProteinShop, supports a combined approach that uses the knowledge-based approach to identify those parts that have homologues.
Our proposed global optimization method for protein structure prediction will continue to have the capability of forming initial structures using the geometry generation and interactive manipulation tools. In addition, the new setup phase will allow for the use of homologous fragments obtained from fold recognition servers. The starting configurations will be assembled by using fragments containing tertiary structure in the areas where some homology was found and pieces of secondary structure formed according to the secondary structure predictions in the other areas. Thus, the higher the percentage of structure homology found, the more correct tertiary structure will be built in the initial configurations. The new version of the global optimization phase will use the interactive steering tool to focus on coil regions that are more likely to be involved in a turn according to the database of known proteins. Unlike the previous global optimization phase that worked on a fixed, predetermined coil region that depended exclusively on the secondary structure prediction, the new phase will use a variety of configurations resulting from different alignments. Therefore, different coil regions will be tried and the interactive steering tool will keep track of the most successful alignments and subspaces chosen so far. Successful completion of the proposed work will result in: (1) a new methodology for protein structure prediction that will combine the power of our physics-based approach to determine new folds with the ability of knowledge-based methods to identify homologous fragments; (2) a high-performance parallel code for computing the tertiary structure of a protein that will be significantly faster than our current code; (3) a visual front end that can be used to manipulate protein structures and study energy functions; (4) an interactive steering tool that can be applied to other computationally intensive, dynamic problems in order to implement an adaptive approach that can react to the changes during the computation.
More information:
Self-guided, web-based SC04 ProteinShop demonstration.
ProteinShop is a standalone application that can be used to interactively manipulate protein models and study the result of changes in energy function as a result of changes in tertiary structure. The ProteinShop software is freely available for binary download to qualified research organizations, and available for license to commercial organizations.