informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics
 

Offerings, by Semester

Training
 
 
 
 
EMBOSS-Explorer: Patterns and Profiles

[ Previous | Up | Next ]

We'll continue to explore the powerful database sequence search program, BLAST as the course progresses. Other database searches are an important part of the bioinformatician's arsenal. When we screen a new sequence against a database of known sequences, we are trying to answer the following questions:

  • Is there any protein of known structure that has sufficient similarity to the sequence of the unknown protein to suggest a familial relationship?
  • If not, which sequence of any known proteins is most similar to the sequence of the unknown protein?

If we can identify a relationship to a protein of known structure, it is possible to infer that the new protein shares a common structure with its relative and to assign its general fold. However, what if the homologue has no known structure? If its function has been identified then we might expect our unknown protein to have a similar or related function. However, exceptions do exist. A classic example is lysozyme, which shares around 50% sequence identity and 70% sequence similarity with alpha-lactalbumin. The two proteins also share similar folds, but their functions are entirely different: the two key catalytic residues of lysozyme are not conserved in alpha-lactalbumin, and the acidic calcium binding motif important to the function of alpha-lactalbumin is not present in most lysoszymes. It is essential that you confirm any computer based predictions with benchwork.

What can you do if sequence similarity alone does not satisfactorily identify a relative? We will show you a few more applications within EMBOSS that can help you predict the function of your sequence.

In a number of cases, the active site of a protein can be recognized by a specific "fingerprint" or "template", a fairly small set of residues that are unique to a family of proteins. An example is the sequence GXGXXG (where G=glycine and X=any amino acid) which defines a GTP binding site. Searching for a (rather loose) predefined string of characters in a sequence is called Pattern Matching - this should be familiar to you from a previous class.

The EMBOSS program patmatmotifs looks for sequence motifs by searching with a pattern search algorithm through the given protein sequence for the patterns defined in the PROSITE database. PROSITE is a database of protein families and domains, based on the observation that, while there are a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.

PRINTS is a database that defines functional protein families, identifying each domain by a number of short, particularly well conserved sequences. A full match to one of these "fingerprints" will match all the relevant short sequences in the correct order. A partial match is recorded if some are missing or if they occur in an incorrect order. The PRINTS database can be searched using the pscan program which is available within EMBOSS.


Exercise 9 - patmatmotifs and pscan

A) Use patmatmotifs and pscan try to uncover functoinal domains. Frankly, the globin protein is of limited interest here. So let's use instead the membrane protein, platelet derived growth factor receptor. Search for known motifs in that protein.

What is/are the motif(s) found and location(s)?

B) Sometimes the patterns sought in proteins are stochastic - there's a probability of one or more residue being associated with another to result in a particular function. Use pscan to scan our PDGF-R. Look at the documentation linked from the pscan page. How would you interpret the signatures that are found?


[ Previous | Up | Next ]

Page last modified September 29, 2008

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo