PIP: Potential Interactions of Proteins


What is PIP?

PIP is a tool that utilises available experimental information in order to predict protein-protein interactions, at present, for three genomes: human, rat and fission yeast.

PIP infers putative interaction partners for a given protein from homologous protein interaction data, even when there is no experimental data available for it.

How does it work?

PIP searches for homologues to proteins that have been found experimentally to interact. The interactions come from different species and a variety of experimental methods were used to detect these, such as yeast-two-hybrid, X-ray crystallograpy, mass spectroscopy, and affinity purification.
Once lists of homologues to each of the experimentally determined proteins have been constructed, PIP tries to identify interactions between the homologues. These putative interactions are then given confidence scores based on two factors: the level of homology to proteins found experimentally to interact, and the amount of experimental data available (see figure below for an illustration of the approach).
 
Each interaction is inferred from homology to experimentally observed interactions. In this schematic, proteins a1 and b1 have been shown experimentally to interact in one organism, here labelled 'species X', and protein a2 and b2 in another, 'species Y'.
Lists of homologues are generated for each of the proteins, ranked by their bit score (sai , sbi , etc.). A protein from one list may interact with a protein from the other (shown by the red arrow) and potential pairwise interactions are scored according to Equation 1, based on homology to the proteins involved in the known interaction.
Furthermore, interactions receive a higher score if they are derived from multiple experimental sources (n>1). The score is additive, for instance, in the example here, the blue and green sequences are predicted to interact based on the interactions in 'species X' and 'species Y' and the overall score is the sum of both pairwise scores. This additive process continues over all experimentally determined protein pairs, N, (e.g. through 'species Z'), for which the rat sequences, labelled blue and green, are present.


How is the confidence score calculated?

The score, S, is calculated for each putative interaction according to the following:

where sai and sbi are sequence similarity bit scores to proteins ai and bi, respectively, which have experimentally been shown to interact; n is the number of experiments linking protein ai to protein bi; and N is the total number of instances where the same pair of proteins is identified as interacting through different homologues.

What does it mean if PIP does not find a binding partner I know should exist?

Interactions that score highly are more probable than lower scoring ones, but a low score does not rule out the possibility of interaction. This can be explained by the fact that a low-scoring interaction might only share a distant homology to an experimentally determined interaction or there is only limited experimental data available for that particular interaction, for example from one two-hybrid study.

The following chart shows the distribution of scores for rat protein interactions that are highly reliable (detected by X-ray crystallography) alongside a random sampling of scores for the whole rat genome:

The scoring function was examined by investigating the scores of binary X-ray crystal structure complexes. This was compared to the expected distribution of scores for the whole rat genome. The crystal complexes showed a trend of having higher scores than expected from the genome-wide distribution, with median scores 128 and 16 respectively, indicating that higher scoring interactions are less likely to be false. This difference was significant according to a chi-square test with 25 degrees of freedom (p«0.0001).


What should I do if PIP does not display a known binding partner to my protein?

PIP starts by looking for close homologues to the protein you enter into the query form. It is possible that the level of homology between your protein and the data in PIP is too low for it to pick up the interaction. A possible solution would be to lower the detection threshold by ticking the relevant box on the front page. An alternative approach is to enter the other known binding partner, and then expand the network by clicking on the proteins in the graph.

Structural coverage

Each protein on the interaction map is shown in a colour from a gradient ranging from white to dark green. The colour indicates to which extent the protein sequence can be structurally modelled by 3D-JIGSAW homology modelling—white (0%) and dark green (100%). Furthermore, to aid the interpretation of the interaction map, the domain structure is tabulated alongside each protein.

References

If you find PIP useful, please cite:
Pall F Jonsson, Tamara Cavanna, Daniel Zicha, Paul A Bates. (2006). Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis. BMC Bioinformatics, 7:2.  [ PDF ]

PIP uses protein-protein interaction data from the Database of Interacting Proteins (http://dip.doe-mbi.ucla.edu/) and the MIPS Mammalian Protein-Protein Interaction Database (http://mips.gsf.de/proj/ppi/). The protein interaction graphs are created by the Graphviz software ( http://www.graphviz.org/). The domain information is obtained from Domain Fishing (http://bmm.cancerresearchuk.org/servers/3djigsaw/dom_fish/), which is a part of the 3DJIGSAW modelling package which can be found at http://bmm.cancerresearchuk.org/servers/3djigsaw/.

Contreras-Moreira, B., Bates, P.A. (2002) Domain fishing: a first step in protein comparative modelling. Bioinformatics, 18, 1141-1142.
North,S., Gansner,E., and Ellson,J. (1998) http://www.graphviz.org.
Pagel,P., Kovac,S., Oesterheld,M., Brauner,B., Dunger-Kaltenbach,I., Frishman,G., Montrone,C., Mark,P., Stumpflen,V., Mewes,H-W. et al. (2004) The MIPS mammalian protein--protein interaction database. Bioinformatics, 21, 832-834.
Pruitt,K.D., Tatusova,T., Maglott,D.R. (2005) CBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins Nucleic Acids Res., 33, D501-D504.
Salwinski,L., Miller,C.S., Smith,A.J., Pettit,F.K., Bowie,J.U. and Eisenberg, D. (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Res., 32, D449-D451.


[PIP front page]