Posts Tagged ‘homology modeling of proteins’

Functional Annotation of Hypothetical proteins

April 30, 2014 Leave a comment

Cloning, expression and purification of difficult to clone, express and purify proteins in E. coli 

Experimental work is though time taking but direct approach for functional annotation of hypothetical proteins; however, at times it is difficult to decide upon the experimental design for a relatively new class of a protein. With increasing size and quality of various protein databases, it is becoming relatively easier to look for the experimental design for the probable function of a protein. Following are the steps that can be used in choosing the type of experimental analysis that needs to be performed and the substrate to be used during laboratory tests.

If the protein is predicted to be an enzyme, BLAST results normally indicates its closely related proteins that can be looked upon for the experimental procedures to be performed as indicated by the matching hits (look for the papers on those proteins that might indicate the type of related function the protein might perform).

With the increasing domain databases, it is possible to analyze the protein domain wise indicating the ability to perform certain kind of biochemical reactions if any. The NCBI’s Conserved Doamin Database (CDD), Pfam and InterProScan databases have a large number of conserved domains that defines a functional class. Presence of certain domain is also indicative of the possible activity of the protein and therefore the type of substrate to be used for defining its chemical activity in laboratory could be helpful.


Composition based analysis of protein: there are various bioinformatics tools available online to studying the amino acid composition based analysis of protein informing various properties which help in indicating the properties of protein which later help with the functional annotation of the proteins i.e. ProtparamSPAANMP3 and a lot more etc.

Homology based modeling: this is an important step in determining the functional annotation of protein based on the structure of the protein, though it may be difficult for the proteins with low identity (<30%) with the already known crystal structures of the protein. However, a good homology model can be an important step towards determining functional annotation for a protein. So also the secondary and tertiary structure prediction of the protein will tell the similar functional categories thereby help in designing relative experimental assays. Some of the commonly used homology based modeling tools are listed here


Phylogenetic analysis: Phylogenetic analysis not only shows evolutionary divergence of the protein but also act as an important step towards functional conservation of the protein. This helps in determining the degree of functional similarity with other related homologous proteins. Thus, determining the appropriate experimental assays towards functional annotation of the protein. With the help of molecular dynamic simulation, this also helps in-silico assessment of the ability of substrate to bind to the protein. In fact it can cut down from large number of substrate molecules to the top most hits, helping to prioritize the experimental analysis, saving time and resources.

phylogenetic analysis

It is sometimes a bit difficult while working with novel proteins for which relevant data is almost negligible worldwide, so you can wait till you get more information.

Let me know if you have more suggestions to add on.

Homology modeling of proteins

April 10, 2012 1 comment


CPHmodels: CPHmodels-3.0 is a web-server predicting protein 3D-structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile-profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6? When superimposed to the 3D-structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D-structure with an average RMSD of 9.3?. These performance values place the CPHmodels-3.0 method in the group of high performing 3D-prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is less than 20 minutes. The web server is available at


MODELLER: MODELLER is used for homology or comparative modeling of protein three-dimensional structures (1,2). The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms. MODELLER implements comparative protein structure modeling by satisfaction of spatial restraints (3,4), and can perform many additional tasks, including de novo modeling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, etc. MODELLER is available for download for most Unix/Linux systems, Windows, and Mac.


SWISS-MODEL: SWISS-MODEL is a fully automated protein structure homology-modeling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). The purpose of this server is to make Protein Modeling accessible to all biochemists and molecular biologists worldwide.


Phyre2:  (Protein Homology/AnalogY Recognition Engine; pronounced as ‘fire’) are web-based services for protein structure prediction that are free for non-commercial use. Phyre is among the most popular methods for protein structure prediction having been cited over 1000 times. Like other remote homology recognition techniques (see protein threading), it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed (funded by the BBSRC) to ensure a user-friendly interface for users inexpert in protein structure prediction methods.


HHpred : the primary aim in developing HHpred was to provide biologists with a method for sequence database searching and structure prediction that is as easy to use as BLAST or PSI-BLAST and that is at the same time much more sensitive in finding remote homologs. In fact, HHpred’s sensitivity is competitive with the most powerful servers for structure prediction currently available. HHpred is the first server that is based on the pair wise comparison of profile hidden Markov models (HMMs). Whereas most conventional sequence search methods search sequence databases such as UniProt or the NR, HHpred searches alignment databases, like Pfam or SMART. This greatly simplifies the list of hits to a number of sequence families instead of a clutter of single sequences. All major publicly available profile and alignment databases are available through HHpred. HHpred accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search results in an easy-to-read format similar to that of PSI-BLAST. Search options include local or global alignment and scoring secondary structure similarity. HHpred can produce pairwise query-template sequence alignments, merged query-template multiple alignments (e.g. for transitive searches), as well as 3D structural models calculated by the MODELLER software from HHpred alignments.


LOMATES:  LOMETS (Local Meta-Threading-Server) is an on-line web service for protein structure prediction. It generates 3D models by collecting high-scoring target-to-template alignments from 8 locally-installed threading programs (FUGUE, HHsearch, MUSTER, PPA, PROSPECT2, SAM-T02, SPARKS, SP3). A detailed description of the server can be seen in the Readme file.


MODBASE: MODBASE ( is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (http:/ MODBASE currently contains 5 152 695 reliable models for domains in 1 593 209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server ( Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (

Robetta:   Robetta provides both ab initio and comparative models of protein domains. It uses the ROSETTA fragment insertion method (Simons et al. (1997) J Mol Biol. 268:209-225). Domains without a detectable PDB homolog are modeled with the Rosetta de novo protocol (Bonneau et al. (2002) J Mol Biol. 322:65-78). Comparative models are built from Parent PDBs detected by UW-PDB-BLAST or HHSEARCH and aligned by various methods which include HHSEARCH, Compass, and Promals. Loop regions are assembled from fragments and optimized to fit the aligned template structure (Rohl et al. (2004) Proteins 55:656-677). The procedure is fully automated. Robetta is evaluated in the blind benchmarking experiment CASP. Robetta uses ROSETTA software which is developed and maintained by the Rosetta Commons.


chunk-TASSER: A protein structure prediction method that combines threading templates from SP3 and ab initio folded chunk structures (three consecutive segments of regular secondary structures). It is better for extreme hard targets


PSiFR (Protein Structure and Function predicton Resource)  provides integrated tools for protein tertiary structure prediction and structure and sequence-based function annotation. The details of various methods used are described below:


Protein structure prediction methods

  1. TASSER(Threading/ASSembly/Refinement) is an automated protein structure prediction and modeling method. TASSER employs a hierarchical approach consisting of template identification by threading, followed by tertiary structure assembly by rearranging continuous template fragments (Zhang, Y. and Skolnick, J., 2004, PNAS).
  2. TASSER-Lite: is a comparative protein tertiary structure modeling tool. It is presently optimized for the modeling of single domain (41-200 residues) homologous protein sequences; that is, proteins with a sequence identity greater than 25% with respect to the best threading template (Pandit et. al., 2006, Biophysical Journal). The templates for the modeling of the query sequence are identified using the threading program PROSPECTOR_3 (Skolnick et. al., 2004, Proteins). Subsequently, the structure is refined using TASSER program with optimized parameters.
  3. METATASSER is a protein tertiary prediction method that employs the 3D-Jury approach to select threading templates from SPARKS2 (Zhou H. and Zhou Y., 2004, Proteins), SP3 ( Zhou H. and Zhou Y., 2005, Proteins) and PROSPECTOR_3 (Skolnick et. al., 2004, Proteins), which provides aligned fragments and tertiary restraints as an input to TASSER procedure to generate full-length models. In the CASP7 and CASP8 assessment of server performance, METATASSER is among the top performing servers (Zhou et. al, 2007, Proteins; Zhou et al., 2009, Proteins (in press)).


ESyPred3D: ESyPred3D is a new automated homology modeling program. The method gets benefit of the increased alignment performances of a new alignment strategy using neural networks. Alignments are obtained by combining, weighting and screening the results of several multiple alignment programs. The final three dimensional structure is built using the modeling package MODELLER.


Protein Model Portal (PMP): gives access to various models computed by comparative modeling methods provided by different partner sites, and provides access to various interactive services for model building, and quality assessment.


ProModel:  ProModel is a complete package for modeling proteins, whose crystal structure is unknown based on the amino acid sequences of a close homologue. ProModel allows homology modeling from either a selected template or a user defined template. Users can perform an automated homology modeling simply by reading in the template file or can perform a knowledge based manual modeling by specific loop insertions or by changing specific amino acid residues. A local BLAST speeds up the process of modeling. ProModel enables an exhaustive analysis of the target protein structure, active site and channels. The user can conveniently view, edit and superimpose proteins with ProModel. Facilities to distribute the secondary structure elements, distribute the Phi-Psi angles of residues in Ramachandran plot, identify and visualize cavities and channels make it a very useful product. ProModel is available for both Linux and Windows® operating systems.


SCWRL4: SCWRL4 is based on a new algorithm and new potential function that results in improved accuracy at reasonable speed. This has been achieved through: 1) a new backbone-dependent rotamer library based on kernel density estimates; 2) averaging over samples of conformations about the positions in the rotamer library; 3) a fast anisotropic hydrogen bonding function; 4) a short-range, soft van der Waals atom-atom interaction potential; 5) fast collision detection using k-discrete oriented polytopes; 6) a tree decomposition algorithm to solve the combinatorial problem; and 7) optimization of all parameters by determining the interaction graph within the crystal environment using symmetry operators of the crystallographic space group. Accuracies as a function of electron density of the side chains demonstrate that side chains with higher electron density are easier to predict than those with low electron density and presumed conformational disorder. For a testing set of 379 proteins, 86% of chi1 angles and 75% of chi1+2 are predicted correctly within 40 degrees of the X-ray positions. Among side chains with higher electron density (25th-100th percentile), these numbers rise to 89% and 80%. The new program maintains its simple command-line interface, designed for homology modeling. To achieve higher accuracy, SCWRL4 is somewhat slower than SCWRL3 when run in the default flexible rotamer model (FRM) by a factor of 3-6, depending on the protein. When run in the rigid rotamer model (RRM), SCWRL4 is about the same speed as SCWRL3. In both cases, SCWRL4 will converge on very large proteins or protein complexes or those with very dense interaction graphs, while SCWRL3 sometimes would not. The SCWRL4 paper has been published in Proteins: Structure, Function, Bioinformatics. A reprint is available. Please cite the paper: G. G. Krivov, M. V. Shapovalov, and R. L. Dunbrack, Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins (2009).


VADAR: VADAR (Volume, Area, Dihedral Angle Reporter) is a compilation of more than 15 different algorithms and programs for analyzing and assessing peptide and protein structures from their PDB coordinate data. The results have been validated through extensive comparison to published data and careful visual inspection.  The VADAR web server supports the submission of either PDB formatted files or PDB accession numbers.  VADAR produces extensive tables and high quality graphs for quantitatively and qualitatively assessing protein structures determined by X-ray crystallography, NMR spectroscopy, 3D-threading or homology modelling.  Please cite the following: Leigh Willard, Anuj Ranjan,Haiyan Zhang,Hassan Monzavi, Robert F. Boyko, Brian D. Sykes, and David S. Wishart “VADAR: a web server for quantitative evaluation of protein structure quality” Nucleic Acids Res. 2003 July 1; 31 (13): 3316.3319


IntFOLD :  The IntFOLD server provides a unified interface for Tertiary structure prediction/3D modeling, 3D model quality assessment, Intrinsic disorder prediction, Domain prediction, Prediction of protein-ligand binding residues


PEPstr:  The Pepstr server predicts the tertiary structure of small peptides with sequence length varying between 7 to 25 residues. The prediction strategy is based on the realization that β-turn is an important and consistent feature of small peptides in addition to regular structures. Thus, the methods uses both the regular secondary structure information predicted from PSIPRED and β-turns information predicted from BetaTurns. The side-chain abgles are placed using standard backbone-dependent rotamer library. The structure is further refined with energy minimization and molecular dynamic simulations using Amber version6.


BSR:  Binding Site Refinement employs a new template-based method for the local refinement of ligand-binding regions in protein models using closely as well as distantly related templates identified by threading. A Support Vector Regression (SVR) model is used to select likely correct binding site geometries in a large ensemble of multiple receptor conformations. The SVR model employs several scoring functions that impose geometrical restraints on the Cα positions, account for a specific chemical environment within a binding site and optimize the interactions with putative ligands.


KeyRecep: KeyRecep is the best-suited solution for rational molecular design when the 3D structure of the target protein is unknown. Users can estimate the characteristics of the binding site of the target protein by superposing multiple active compounds in 3D space so that the physicochemical properties of the compounds match maximally with each other. (Estimation of virtual receptor model) Users can also examine relationship between chemical structures and the activities based on the multiple regression analysis with indices of conformity of each compound to the virtual receptor model and the activity values. (3D-SAR function) For compounds whose activities are unknown, users can estimate the activities based on the indices of conformity to the virtual receptor model and can perform virtual screening. (DB search function) KeyRecep rationally and strategically accelerates the molecular design projects based on hit compounds discovered by high throughput screening (HTS) or based on information on compounds from literature or patents. KeyRecep facilitates the structural expansion of such compounds to obtain lead compounds and further drug candidates.


PROTEUS2:  PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane β-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline. Using a combination of progressive multi-sequence alignment, structure-based mapping, hidden Markov models, multi-component neural nets and up-to-date databases of known secondary structure assignments, PROTEUS2 is able to achieve among the highest reported levels of predictive accuracy for signal peptides (Q2=94%), membrane spanning helices (Q2=87%) and secondary structure (Q3 score of 81.3% ). PROTEUS2’s homology modeling services also provide high quality 3D models that compare favorably with those generated by SWISS-MODEL (within 0.2 Å RMSD). The average PROTEUS2 prediction takes ~2 minutes per query sequence. Source code is also freely available here.


PSIPRED:  is a simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated – BLAST). Using a very stringent cross validation method to evaluate the method’s performance, PSIPRED 2.6 achieves an average Q3 score of 80.7%. Predictions produced by PSIPRED were also submitted to the CASP4 evaluation and assessed during the CASP4 meeting, which took place in December 2000 at Asilomar. PSIPRED 2.0 achieved an average Q3 score of 80.6% across all 40 submitted target domains with no obvious sequence similarity to structures present in PDB, which ranked PSIPRED top out of 20 evaluated methods (an earlier version of PSIPRED was also ranked top in CASP3 held in 1998). It is important to realize, however, that due to the small sample sizes, the results from CASP are not statistically significant, although they do give a rough guide as to the current “state of the art”. For a more reliable evaluation, the EVA web site at Columbia University provides a continuous evaluation. Also see the EVA servlet to visualize a breakdown of specific types of errors made by PSIPRED and other secondary structure prediction methods. NOTE that at the time of writing, the EVA site is no longer being updated. The PSIPRED V2.6 software can be downloaded from HERE. Please note that you should read the license terms given in the README file if you wish to incorporate PSIPRED in another program or Web server. Older releases of PSIPRED can be downloaded here HERE.


I-TASSER : server is an Internet service for protein structure and function predictions. 3D models are built based on multiple-threading alignments by LOMETS and iterative TASSER assembly simulations; function insights are then derived by matching the predicted models with protein function databases. I-TASSER (as ‘Zhang-Server’) was ranked as the No 1 server for protein structure prediction in recent CASP7, CASP8 and CASP9 experiments. It was also ranked as the best for function prediction in CASP9. The server is in active development with the goal to provide the most accurate structural and functional predictions using state-of-the-art algorithms.


JPred: Jpred is a Protein Secondary Structure Prediction server and has been in operation since approximately 1998. Jpred incorporates the Jnet algorithm in order to make more accurate predictions. In addition to protein secondary structure Jpred also makes predictions on Solvent Accessibility and Coiled-coil regions (Lupas method). The current version of Jpred (v3) follows on from previous versions of Jpred developed and maintained by James Cuff and Jonathan Barber


Verifying your modeled protein with online servers:


Stuctural Analysis and Verification Server (SAVS): SAVS uses following servers to check the quality of the protein structures: Procheck: Checks the stereochemical quality of a protein structure by analyzing residue-by-residue geometry and overall structure geometry. [Reference] What_Check: Derived from a subset of protein verification tools from the WHATIF program (Vriend, 1990), this does extensive checking of many sterochemical parameters of the residues in the model. [Reference] ERRAT: Analyzes the statistics of non-bonded interactions between different atom types and plots the value of the error function versus position of a 9-residue sliding window, calculated by a comparison with statistics from highly refined structures. [Reference] Verify3D: Determines the compatibility of an atomic model (3D) with its own amino acid sequence (1D) by assigned a structural class based on its location and environment (alpha, beta, loop, polar, nonpolar etc) and comparing the results to good structures. [Reference] Prove: Calculates the volumes of atoms in macromolecules using an algorithm which treats the atoms like hard spheres and calculates a statistical Z-score deviation for the model from highly resolved (2.0 Å or better) and refined (R-factor of 0.2 or better) PDB-deposited structures. [Reference]


COLORADO-3D: COLORADO-3D is a www-tool that greatly facilitates the visual analysis of various features in three-dimensional protein structures, directly at the level of the protein structure, with the aid of commonly used viewers such as RASMOL or SWISSPDBVIEWER. Among the features most important for the structural biologist that our server allows to visualize in color are potential errors in protein structure (detected by ANOLEA, PROSA, PROVE,VERIFY3D), regions buried in the protein core and inaccessible to the solvent, and regions of high or low sequence conservation (e.g. detected by RATE4SITE). In particular COLORADO3D may serve to visualize the results of assessment of the protein structure’s quality at various stages of the model building and refinement (both in the case of experimental structure determination and homology modeling).

%d bloggers like this: