Archive

Posts Tagged ‘Protein’

Predicting Beta Barrel Outer Membrane Proteins (OMPs)

June 27, 2017 2 comments

PRED-TMBB: a web server for predicting the topology of beta-barrel outer membrane proteins. The beta-barrel outer membrane proteins constitute one of the two known structural classes of membrane proteins. Whereas there are several different web-based predictors for alpha-helical membrane proteins, currently there is no freely available prediction method for beta-barrel membrane proteins, at least with an acceptable level of accuracy. We present here a web server (PRED-TMBB, http://bioinformatics.biol.uoa.gr/PRED-TMBB) which is capable of predicting the transmembrane strands and the topology of beta-barrel outer membrane proteins of Gram-negative bacteria. The method is based on a Hidden Markov Model, trained according to the Conditional Maximum Likelihood criterion. The model was retrained and the training set now includes 16 non-homologous outer membrane proteins with structures known at atomic resolution. The user may submit one sequence at a time and has the option of choosing between three different decoding methods. The server reports the predicted topology of a given protein, a score indicating the probability of the protein being an outer membrane beta-barrel protein, posterior probabilities for the transmembrane strand prediction and a graphical representation of the assumed position of the transmembrane strands with respect to the lipid bilayer. http://nar.oxfordjournals.org/content/32/suppl_2/W400.long

  1. BOCTOPUS (2012): http://boctopus.cbr.su.se/

BOCTOPUS: improved topology prediction of transmembrane β barrel proteins

Transmembrane β barrel proteins (TMBs) are found in the outer membrane of Gram-negative bacteria, chloroplast and mitochondria. They play a major role in the translocation machinery, pore formation, membrane anchoring and ion exchange. TMBs are also promising targets for antimicrobial drugs and vaccines. Given the difficulty in membrane protein structure determination, computational methods to identify TMBs and predict the topology of TMBs are important. Results: Here, we present BOCTOPUS; an improved method for the topology prediction of TMBs by employing a combination of support vector machines (SVMs) and Hidden Markov Models (HMMs). The SVMs and HMMs account for local and global residue preferences, respectively. Based on a 10-fold cross-validation test, BOCTOPUS performs better than all existing methods, reaching a Q3 accuracy of 87%. Further, BOCTOPUS predicted the correct number of strands for 83% proteins in the dataset. BOCTOPUS might also help in reliable identification of TMBs by using it as an additional filter to methods specialized in this task. http://bioinformatics.oxfordjournals.org/content/28/4/516.long

  1. TBBpred (2004): http://www.imtech.res.in/raghava/tbbpred/

Prediction of transmembrane regions of β-barrel proteins using ANN- and SVM-based methods. This article describes a method developed for predicting transmembrane β-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using “leave one out cross-validation” (LOOCV). http://onlinelibrary.wiley.com/doi/10.1002/prot.20092/abstract;jsessionid=F041C3CA2F5E53B83924D0D73D2832C7.f03t02

  1. BETAWARE (2013): http://www.biocomp.unibo.it/~savojard/betawarecl/

BETAWARE: a machine-learning tool to detect and predict transmembrane beta-barrel proteins in prokaryotes. The annotation of membrane proteins in proteomes is an important problem of Computational Biology, especially after the development of high-throughput techniques that allow fast and efficient genome sequencing. Among membrane proteins, transmembrane β-barrels (TMBBs) are poorly represented in the database of protein structures (PDB) and difficult to identify with experimental approaches. They are, however, extremely important, playing key roles in several cell functions and bacterial pathogenicity. TMBBs are included in the lipid bilayer with a β-barrel structure and are presently found in the outer membranes of Gram-negative bacteria, mitochondria and chloroplasts. Recently, we developed two top-performing methods based on machine-learning approaches to tackle both the detection of TMBBs in sets of proteins and the prediction of their topology. Here, we present our BETAWARE program that includes both approaches and can run as a standalone program on a linux-based computer to easily address in-home massive protein annotation or filtering. http://bioinformatics.oxfordjournals.org/content/29/4/504.abstract

  1. ConBBPRED (2005): http://bioinformatics.biol.uoa.gr/ConBBPRED/index.jsp

Prediction of the transmembrane strands and topology of β-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs), on Neural Networks (NNs) and on Support Vector Machines (SVMs). In this work, we compare the different available methods for topology prediction of β-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 β-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method. We assess the statistical significance of the performance of each prediction scheme and conclude that Hidden Markov Model based methods, HMM-B2TMR, ProfTMB and PRED-TMBB, are currently the best predictors, according to either the per-residue accuracy, the segments overlap measure (SOV) or the total number of proteins with correctly predicted topologies in the test set. Furthermore, we show that the available predictors perform better when only transmembrane β-barrel domains are used for prediction, rather than the precursor full-length sequences, even though the HMM-based predictors are not influenced significantly. The consensus prediction method performs significantly better than each individual available predictor, since it increases the accuracy up to 4% regarding SOV and up to 15% in correctly predicted topologies.

http://www.biomedcentral.com/1471-2105/6/7

  1. TMBETA-RBF (2008): http://rbf.bioinfo.tw/~sachen/OMPpredict/TMBETADISC-RBF.php

TMBETA-NET: discrimination and prediction of membrane spanning β-strands in outer membrane proteins. We have developed a web-server, TMBETA-NET for discriminating outer membrane proteins and predicting their membrane spanning β-strand segments. The amino acid compositions of globular and outer membrane proteins have been systematically analyzed and a statistical method has been proposed for discriminating outer membrane proteins. The prediction of membrane spanning segments is mainly based on feed forward neural network and refined with β-strand length. Our program takes the amino acid sequence as input and displays the type of the protein along with membrane-spanning β-strand segments as a stretch of highlighted amino acid residues. Further, the probability of residues to be in transmembrane β-strand has been provided with a coloring scheme. We observed that outer membrane proteins were discriminated with an accuracy of 89% and their membrane spanning β-strand segments at an accuracy of 73% just from amino acid sequence information. The prediction server is available at http://psfs.cbrc.jp/tmbeta-net/

  1. TMB-HUNT (2005): http://www.bioinformatics.leeds.ac.uk/betaBarrel/

TMB-Hunt: a web server to screen sequence sets for transmembrane β-barrel proteins. TMB-Hunt is a program that uses a modified k-nearest neighbour (k-NN) algorithm to classify protein sequences as transmembrane β-barrel (TMB) or non-TMB on the basis of whole sequence amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, a discrimination accuracy of 92.5% was achieved, as tested using a rigorous cross-validation procedure. The TMB-Hunt web server, available at www.bioinformatics.leeds.ac.uk/betaBarrel, allows screening of up to 10 000 sequences in a single query and provides results and key statistics in a simple colour coded format. http://nar.oxfordjournals.org/content/33/suppl_2/W188.long

  1. TMBPro (2008): suite of specialized predictors for predicting secondary structure, beta-contacts, and tertiary structure of Transmembrane Beta-Barrel (TMB) proteins. http://tmbpro.ics.uci.edu/ TMBpro: secondary structure, β-contact and tertiary structure prediction of transmembrane β-barrel proteins. Transmembrane β-barrel (TMB) proteins are embedded in the outer membranes of mitochondria, Gram-negative bacteria and chloroplasts. These proteins perform critical functions, including active ion-transport and passive nutrient intake. Therefore, there is a need for accurate prediction of secondary and tertiary structure of TMB proteins. Traditional homology modeling methods, however, fail on most TMB proteins since very few non-homologous TMB structures have been determined. Yet, because TMB structures conform to specific construction rules that restrict the conformational space drastically, it should be possible for methods that do not depend on target-template homology to be applied successfully.Results: We develop a suite (TMBpro) of specialized predictors for predicting secondary structure (TMBpro-SS), β-contacts (TMBpro-CON) and tertiary structure (TMBpro-3D) of transmembrane β-barrel proteins. We compare our results to the recent state-of-the-art predictors transFold and PRED-TMBB using their respective benchmark datasets, and leave-one-out cross-validation. Using the transFold dataset TMBpro predicts secondary structure with per-residue accuracy (Q2) of 77.8%, a correlation coefficient of 0.54, and TMBpro predicts β-contacts with precision of 0.65 and recall of 0.67. Using the PRED-TMBB dataset, TMBpro predicts secondary structure with Q2 of 88.3% and a correlation coefficient of 0.75. All of these performance results exceed previously published results by 4% or more. Working with the PRED-TMBB dataset, TMBpro predicts the tertiary structure of transmembrane segments with RMSD <6.0 Å for 9 of 14 proteins. For 6 of 14 predictions, the RMSD is <5.0 Å, with a GDT_TS score greater than 60.0. http://bioinformatics.oxfordjournals.org/content/24/4/513.long
  1. MCMBB Markov Chain Model Beta Barrels (2004): http://athina.biol.uoa.gr/bioinformatics/mcmbb/

The task of finding β-barrel outer membrane proteins of the gram-negative bacteria is of greatimportance in current Bioinformatics research. We developed a computational method, which discriminates β- barrel outer membrane proteins from globular ones and, also, from α-helical membrane proteins. The methodis based on a 1st order Markov Chain model, which captures the alternating pattern of hydrophilic-hydrophobicresidues occurring in the membrane-spanning beta-strands of beta-barrel outer membrane proteins. The modelachieves high accuracy in discriminating outer membrane proteins, and could be used alone, or in conjunctionwith other more sophisticated methods, already available http://www.academia.edu/316959/Finding_Beta-Barrel_Outer_Membrane_Proteins_With_a_Markov_Chain_Model

  1. TMB-KNN (2008): http://cs.ndsu.nodak.edu/~chayan/Server/TMB_KNN.html

TMB-Hunt: a web server to screen sequence sets for transmembrane β-barrel proteins

TMB-Hunt is a program that uses a modified k-nearest neighbour (k-NN) algorithm to classify protein sequences as transmembrane β-barrel (TMB) or non-TMB on the basis of whole sequence amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, a discrimination accuracy of 92.5% was achieved, as tested using a rigorous cross-validation procedure. The TMB-Hunt web server, available at www.bioinformatics.leeds.ac.uk/betaBarrel, allows screening of up to 10 000 sequences in a single query and provides results and key statistics in a simple colour coded format. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1160145/

  1. transFold (2006): super-secondary structure prediction of transmembrane β-barrel proteins http://bioinformatics.bc.edu/clotelab/transFold/

transFold: a web server for predicting the structure and residue contacts of transmembrane beta-barrels. Transmembrane β-barrel (TMB) proteins are embedded in the outer membrane of Gram-negative bacteria, mitochondria and chloroplasts. The cellular location and functional diversity of β-barrel outer membrane proteins makes them an important protein class. At the present time, very few non-homologous TMB structures have been determined by X-ray diffraction because of the experimental difficulty encountered in crystallizing transmembrane (TM) proteins. The transFold web server uses pairwise inter-strand residue statistical potentials derived from globular (non-outer-membrane) proteins to predict the supersecondary structure of TMB. Unlike all previous approaches, transFold does not use machine learning methods such as hidden Markov models or neural networks; instead, transFold employs multi-tape S-attribute grammars to describe all potential conformations, and then applies dynamic programming to determine the global minimum energy supersecondary structure. The transFold web server not only predicts secondary structure and TMB topology, but is the only method which additionally predicts the side-chain orientation of transmembrane β-strand residues, inter-strand residue contacts and TM β-strand inclination with respect to the membrane. The program transFold currently outperforms all other methods for accuracy of β-barrel structure prediction. Available at http://bioinformatics.bc.edu/clotelab/transFold. http://nar.oxfordjournals.org/content/34/suppl_2/W189.full

  1. BOMP (2004): http://services.cbu.uib.no/tools/bomp

BOMP: a program to predict integral β-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria. This work describes the development of a program that predicts whether or not a polypeptide sequence from a Gram-negative bacterium is an integral β-barrel outer membrane protein. The program, called the β-barrel Outer Membrane protein Predictor (BOMP), is based on two separate components to recognize integral β-barrel proteins. The first component is a C-terminal pattern typical of many integral β-barrel proteins. The second component calculates an integral β-barrel score of the sequence based on the extent to which the sequence contains stretches of amino acids typical of transmembrane β-strands. The precision of the predictions was found to be 80% with a recall of 88% when tested on the proteins with SwissProt annotated subcellular localization in Escherichia coli K 12 (788 sequences) and Salmonella typhimurium (366 sequences). When tested on the predicted proteome of E.coli, BOMP found 103 of a total of 4346 polypeptide sequences to be possible integral β-barrel proteins. Of these, 36 were found by BLAST to lack similarity (E-value score < 1e−10) to proteins with annotated subcellular localization in SwissProt. BOMP predicted the content of integral β-barrels per predicted proteome of 10 different bacteria to range from 1.8 to 3%. BOMP is available at http://www.bioinfo.no/tools/bomp http://nar.oxfordjournals.org/content/32/suppl_2/W394.full

  1. TMBETA-net (2004): http://psfs.cbrc.jp/tmbeta-net/

TMBETA-NET: discrimination and prediction of membrane spanning beta-strands in outer membrane proteins. We have developed a web-server, TMBETA-NET for discriminating outer membrane proteins and predicting their membrane spanning beta-strand segments. The amino acid compositions of globular and outer membrane proteins have been systematically analyzed and a statistical method has been proposed for discriminating outer membrane proteins. The prediction of membrane spanning segments is mainly based on feed forward neural network and refined with beta-strand length. Our program takes the amino acid sequence as input and displays the type of the protein along with membrane-spanning beta-strand segments as a stretch of highlighted amino acid residues. Further, the probability of residues to be in transmembrane beta-strand has been provided with a coloring scheme. We observed that outer membrane proteins were discriminated with an accuracy of 89% and their membrane spanning beta-strand segments at an accuracy of 73% just from amino acid sequence information. The prediction server is available at http://psfs.cbrc.jp/tmbeta-net/. http://nar.oxfordjournals.org/content/33/suppl_2/W164.long

  1. TMBB-DB (2012): http://beta-barrel.tulane.edu/index.html

TMBB-DB: a transmembrane β-barrel proteome database. We previously reported the development of a highly accurate statistical algorithm for identifying β-barrel outer membrane proteins or transmembrane β-barrels (TMBBs), from genomic sequence data of Gram-negative bacteria (Freeman,T.C. and Wimley,W.C. (2010) Bioinformatics26, 1965–1974). We have now applied this identification algorithm to all available Gram-negative bacterial genomes (over 600 chromosomes) and have constructed a publicly available, searchable, up-to-date, database of all proteins in these genomes. For each protein in the database, there is information on (i) β-barrel membrane protein probability for identification of β-barrels, (ii) β-strand and β-hairpin propensity for structure and topology prediction, (iii) signal sequence score because most TMBBs are secreted through the inner membrane translocon and, thus, have a signal sequence, and (iv) transmembrane α-helix predictions, for reducing false positive predictions. This information is sufficient for the accurate identification of most β-barrel membrane proteins in these genomes. In the database there are nearly 50 000 predicted TMBBs (out of 1.9 million total putative proteins). Of those, more than 15 000 are ‘hypothetical’ or ‘putative’ proteins, not previously identified as TMBBs. This wealth of genomic information is not available anywhere else. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3463127/

How to make a protein soluble?

April 30, 2014 Leave a comment

Cloning, expression and purification of difficult to clone, express and purify proteins in E. coli 

I have got some mails in relation to the expression of difficult to purify proteins, so I thought of making a short do’s and don’t’s. For pure bioinformatic people, please bear with me for a couple of posts. First of all it is important to know about the protein, gather as much information about the protein as you can. All those small pieces of information help a lot if kept in mind while designing the strategy for cloning, expression and purification of the proteins. Also be informed about the source of protein, eukaryotic or prokaryotic or any others source. Some of the basic parameters like the size of the protein, PI, amino acid composition etc. pays a vital role in designing the strategy. Here are some tools to look for such information

I have compiled on this blog beforehttp://bioinformatictools.blogspot.in/2014/04/functional-annotation-of-hypothetical.html andhttp://bioinformatictools.blogspot.in/2011/11/in-silico-characterization-of-proteins.html. Look for other sources too. Main theme is to find as much information about the protein as much one could. I am not a big fan of purifying the protein under denaturing condition. There are lots of question that are difficult to answer if the protein needs to be refolded from denaturing conditions, like if the protein has folded properly, if this is the way the protein is natively folded and not just any random refolding of the protein, which are difficult to demonstrate experimentally until you already have some assay in mind. Since I have tried that too I will end by suggesting what all I have learned on that part.

 computational

Downstream experimental procedures: Before designing strategy for Cloning, expression and purification of protein, it is wise to determine the downstream experimental procedure you are going to perform and strategy for Cloning, expression and purification mainly depends on this. At times it is possible to purify the protein in soluble form in very small amount using a very large culture (which is ok, if you need very small amount of protein for downstream experiments) for which one need not go through all the standardization experiments with trials in different vectors and host cells. However, in case if large amount of protein is required (such as in crystallization experiments) it is advised to optimize the purification process overall.

Read as much as you can: There are various resources available for suggestions for cloning, expression and purification of the protein in soluble fraction (i.e. QIAexpress handbook). But please keep in mind that it’s easy to suggest in wet lab work but it takes a lot of time and energy to perform the experiments the way one wishes to, so try what you think is logical and more importantly easily available to you (do-able).

Membrane or membrane associated protein: check if the selected protein is Membrane or membrane associated protein. This can be done by using surface localization tools, some of them are listed here http://bioinformatictools.blogspot.in/2007/09/predicting-subcellular-localization-of.html. Also, check if the protein Transmembrane domain (TMHMMhttp://www.cbs.dtu.dk/services/TMHMM/) or signal peptide (Signal Phttp://www.cbs.dtu.dk/services/SignalP/) in it. These are hydrophobic regions and are normally intrinsically disordered.  Membrane proteins are bit tough to get in soluble form till one removes the transmembrane or signal peptide part. It is logical to remove the initial (normally N-terminal) transmembrane or signal peptide part to get the functional domain or multiple domains in soluble form. (I had similar problem with a protein I was working on, when removed the signal peptide and transmembrane domain, it solved everything, got the protein into soluble fraction and got purified as charm, got it crystallized also).

Check for the functional domain in protein if any:  This will help in determining the probable function the protein might be having. This will also indicate the other proteins with similar domain and their nature with respect to the cloning, expression and purification of the protein in E. coli. If you can find the protein with the similar domain use the cloning, expression and purification protocol for target protein. Also, for some of the protein the sequence based analysis results/characters change with addition of the tag, keep this in mind too, it might lead to change in PI or so on.

domain analysis

Optimize the temperature: Try different temperature for growth and induction. Induction temperature is more crucial.

  1. Try growing cells at 370 C and induction at 370 C.
  2. Try growing cells at 370 C and induction at 250 C for long time.
  3. Try growing cells at 370 C and induction at 160 C for long time.
  4. Try growing cells at 250 C and induction at 160 C for long time.
  5. Try growing cells at 370 C followed by chilling at 160 C at least one hour before induction.

Low temperature decreases the rate of protein synthesis and usually more soluble protein is obtained. Also, if the temperature is reduced before induction of the cells, it is more likely to yield protein in soluble fraction, it kind of diverts from the pathway of going into inclusion bodies (Sorry, I do not know how).

Optimize the IPTG concentration: it is a good idea to check a gradient in a small scale for the amount of IPTG (using a range from 0.1, 0.2, 0.3 ….mM) required for optimal expression level of the protein. Normally, IPTG is required at very low levels for optimal expression and using higher concentration not only is costly, but also doesn’t show much improvement in the expression level of the protein.

Use a large tag, but make sure to make and arrangement to remove it once you have the protein: Larger tags like intein tag, His-SUMO, GST tag, MBP (maltose binding protein) etc. are known to increase the solubility of proteins, use them if you have the corresponding vectors easily available for them.

Change the vector: using a weaker promoter (e.g. trc instead of T7) and using a lower copy number plasmid normally increases the chance of protein to be purified in soluble fraction. Also, using N- and/or C- terminal tags (in various vectors) affects the solubility of the protein, especially in those protein where folding is dependent on any of these terminals.

Change the host cells: Some of the E. coli strains are better capable of handling toxic or membrane proteins in comparison to others. I had very good experience working with C41 and C43 strains which I came to know through this paper http://www.ncbi.nlm.nih.gov/pubmed/15294299. There are also pLysS versions of these strains, I did not try but you can read and try. Other strains like rosetta etc. might also be good to try (depends upon the strains you can get your hands on) (So, beg, borrow or steal ;)). For a new protein I usually perform as many changes one by one as I can do at small scale and then move them onto large scale. Also, check if your protein is using codons that are rarely used in E. coli. You can check ‘rare codon usage’ using different software available.

Change the culture media: After changing and optimizing as many parameters I could, I was getting low level of protein in soluble fraction in LB media, I read somewhere that someone had good yield with the Terrific Broth, I tried and it gave a way more protein in soluble fraction. I was happy to use it thereafter for any protein I had to purify.

Use Auto-induction media: it will be worthwhile trying auto-induction. The idea is that instead of using an inducing agent like IPTG one uses the native function of the T7 promoter. So if you use media containing glucose and lactose and grow the cells, as the glucose is depleted, the cells will slowly start activating their T7 promoters which will start using lactose in place of glucose. This will also induce the promoters on your expression vector and lead to a much more gradual expression than from using IPTG.

To be continued on

Purify the protein under denaturing condition and refold: 

In-silico characterization of proteins

March 27, 2012 Leave a comment

BLAST: In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. Different types of BLASTs are available according to the query sequences. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. The BLAST program was designed by Eugene Myers, Stephen Altschul, Warren Gish, David J. Lipman, and Webb Miller at the NIH and was published in the Journal of Molecular Biology in 1990

CDD search: Conserved Domain Database (CDD) is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins. These are available as position-specific score matrices (PSSMs) for fast identification of conserved domains in protein sequences via RPS-BLAST. CDD content includes NCBI-curated domains, which use 3D-structure information to explicitly to define domain boundaries and provide insights into sequence/structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM).

PFAM: The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families. Although these Pfam-A entries cover a large proportion of the sequences in the underlying sequence database, in order to give a more comprehensive coverage of known proteins we also generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans. A clan is a collection of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM.

TMHMM: A variety of tools are available to predict the topology of transmembrane proteins. To date no independent evaluation of the performance of these tools has been published. A better understanding of the strengths and weaknesses of the different tools would guide both the biologist and the bioinformatician to make better predictions of membrane protein topology.

SignalP: SignalP 4.0 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

STRING: STRING is a database of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources i.e. Genomic context, high throughput experiments, coexpression, previous knowledge. STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. The database currently covers 5’214’234 proteins from 1133 organisms.

PROTPARAM: ProtParam (References / Documentation) is a tool which allows the computation of various physical and chemical parameters for a given protein stored in Swiss-Prot or TrEMBL or for a user entered sequence. The computed parameters include the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY)

PROSITE: Search your query sequence for protein motifs, rapidly compare your query protein sequence against all patterns stored in the PROSITE pattern database and determine what the function of an uncharacterised protein is. This tool requires a protein sequence as input, but DNA/RNA may be translated into a protein sequence using transeq and then queried.

InterPro: InterPro is an integrated database of predictive protein “signatures” used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures.

Predicting Subcellular Localization of Proteins

March 19, 2012 2 comments

It is interesting to study the localization of proteins in subcellular due to several reasons. Here is a collection of the online available softwares that help in predicting subcellular localization of the proteins. Prediction is done with the help of programs which are trained for this purpose, this greatly helps in selection procedure, to select for a protein to work upon. Though there are more I have enlisted some commonly used.

 

CELLO : CELLO is a multi-class SVM classification system. CELLO uses 4 types of sequence coding schemes: the amino acid composition, the di-peptide composition, the partitioned amino acid composition and the sequence composition based on the physico-chemical properties of amino acids. We combine votes from these classifiers and use the jury votes to determine the final assignment. Yu CS, Lin CJ, Hwang JK: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Science 2004, 13:1402-1406.

 

PSORTb: Based on a study last performed in 2010, PSORTb v3.0.2 is the most precise bacterial localization prediction tool available. PSORTb v3.0.2 has a number of improvements over PSORTb v2.0.4. Version 2 of PSORTb is maintained here. You can currently submit one or more Gram-positive or Gram-negative bacterial sequences or archaeal sequences in FASTA format. Copy and paste your FASTA-formatted sequences into the textbox below or select a file containing your sequences to upload from your computer.

 

TMHMM Server: This server is for prediction of transmembrane helices in proteins. You can submit many proteins at once in one fasta file. Please limit each submission to at most 4000 proteins. Please tick the ‘One line per protein’ option. Please leave time between each large submission.S. Moller, M.D.R. Croning, R. Apweiler. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics, 17(7):646-653, July 2001.

 

SignalP 3.0 Server: SignalP 3.0 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks and hidden Markov models. Locating proteins in the cell using TargetP, SignalP, and related tools Olof Emanuelsson, Søren Brunak, Gunnar von Heijne, Henrik Nielsen Nature Protocols 2, 953-971 (2007).

 

LOCtree: LOCtree can predict the subcellular localization and DNA-binding propensity of non-membrane proteins in non-plant and plant eukaryotes as well as prokaryotes. LOCtree classifies eukaryotic animal proteins into one of five subcellular classes, while plant proteins are classified into one of six classes and prokaryotic proteins are classified into one of three classes . The novel feature of using a hierarchical architecture is the ability to make intermediate localization class predictions at much higher accuracy’s. Another source of improvement is the use of ‘noisy’ training data. ‘Noisy’ predictions from LOCKey (SWISS-PROT keyword based annotations) and LOCHom (annotations using sequence homology) are used to train the hierarchical SVMs.

 

PredictProtein: PredictProtein integrates feature prediction for secondary structure, solvent accessibility, transmembrane helices, globular regions, coiled-coil regions ,structural switch regions, B-values, disorder regions, intra-residue contacts, protein-protein and protein-DNA binding sites, sub-cellular localization, domain boundaries, beta-barrels, cysteine bonds, metal binding sites and disulphide bridges.

Protein Blast against another set of proteins

March 17, 2012 Leave a comment
Protein Blast against another set of proteins

This tool is provided by NCBI/ BLAST/ blastp suite: BLASTP programs search protein databases using a protein query.This gives BLAST of a query protein against a set of other proteins. I found it useful when you don’t wish to BLAST your query against whole protein database, instead a set of proteins given by the user. This tool is located here, Protein Blast against another set of proteins

%d bloggers like this: