Copyright 2003, Gale
Rhodes, adapted with permission of the author.
Introduction
This tutorial allows you to explore opsins -- the proteins that
catch light for our eyes -- and the genes that code for opsins. But
the real subject of this exercise is bioinformatics -- the use
of computers to search for, explore, and use information about genes,
nucleic acids, and proteins. While learning about the human opsins,
you will use some of today's most powerful bioinformatics tools. You
can follow up this tutorial with a study of opsins from other
organisms, or by exploring any class of biomolecules that interest
you.
I assume that you are conversant with biochemistry and molecular
biology. If you see unfamiliar terms pertaining to the genes, mRNAs,
and proteins used as examples here, break out your biochemistry text,
head for the index, and review, review, review.
For more information about each database or tool, go to its home
page and read, read read.
Cast of Characters
I. The Databases (and their acronyms!)
II. The Tools
- NCBI Map Viewer
For finding genes and gene products (RNAs and proteins) that
interest you
- BLAST
For finding genes or proteins with sequences similar to yours
- ClustalW
For comparing your sequence with others, and lots of sequences
with each other
- Phylip
For making phylogenetic trees, which show how sequences are
related to each other.
- Treeprint
For printing phylogenetic trees
- PSIPRED
For predicting the location of helices, pleated sheets, and
transmembrane elements of proteins of unknown structure
- Swiss-Model
For automated building theoretical structural models of your
sequence based on known structures (homology modeling)
- Deep View (also knows as Swiss-PdbViewer)
For seeing and exploring macromolecular models in three
dimensions, and for manual and semiautomated homology
modeling
- PubMed
For searching ALL the literature of the life sciences
- ExPASy (Expert Protein Analysis System
Not so much a tool as a tool box -- a very complete set of protein
analysis tools
Here We Go
Our subject is human opsins, those proteins, found in the cells of
your retina, that catch light and begin the process of vision. We
will proceed by asking questions about opsins and opsin genes, and
then using bioinformatics to answer them.
When I provide a web address, I'll also make it a link -- just
click it to go to the site. Then make it a bookmark so you can find
it again.
Where are the opsin genes in the human genome?
Point your browser to http://www.ncbi.nlm.nih.gov/mapview/.
Read the instructions. Note that you can look at a genome by
clicking on the NAME of the species, not the B beside it. The
species name takes you to a viewer for the genome of that organism.
The B takes you to a BLAST search tool (later).
Click Homo sapiens (human).
You see a diagram of the human chromosomes, and a search box at
the top. Enter "opsin" in the box next to Search for.
Click Find.
You see the diagram again, with red marks at your "hits", the
locations of genes whose entries contain "opsin" as a whole or
partial word. Below the diagram is a list of the indicated genes.
Among them are the rhodopsin gene (RHO), and three cone pigments,
short-, medium-, and long-wavelength sensitive opsins (for blue,
green, and red light detection). Four hits look like visual pigments,
which probably does not surprise you. To the left of each entry is
the chromosome number, allowing you to tell which red mark
corresponds to each entry. Note that two opsins are on the X
chromosome, one of the sex-determining chromosomes. You can pursue
multiple hits on the same chromosome with the all matches link
for that chromosome.
Click all matches next to X.
You see a very complicated display (don't sweat -- we're going to
use only a part of this now). On the left is a diagram of the X
chromosome, with red marks at the positions of the gene(s) you've
followed to this page -- in our case, the two opsins, medium- and
long-wave, which are located near the bottom tip of the X chromosome.
To the right are various representations of the X chromosome, with
listings of annotated areas. The two opsin genes are highlighted in
pink. If you pass your cursor over this page without clicking, you
will find that some symbols provide brief information, most about
regions that are not yet characterized well enough to have a full
entry.
As you can see, there is a tremendous amount of information on
this page, with links to much more. If you want full information
about the meanings of abbreviations and symbols on this page, as well
as the kinds of information linked to the page, you can use Map
Viewer Help at the top of the page. You will find abundant
information about the Map Viewer, explanations of all symbols and
links, and even tutorials about how to ask and answer all kinds of
questions about the genome.
For now, note the information provided for the first of the two
highlighted opsin genes, OPN1LW (this is called the gene
symbol). You see that this is the long-wavelength-sensitive (red)
opsin, and that it's a gene involved in color blindness (a sex-linked
trait -- no surprise).
What do scientists know about the opsins?
Click OPN1LW.
You have entered LocusLink, which is a sort of highway
interchange with routing to all sorts of information about this gene.
Scan down the page. Some of the information is very plain and
understandable, while some is very cryptic. One of the most
accessible links is to OMIM (for Online Mendeliam Inheritance
in Man), a catalog of human genes and genetic disorders. Despite the
name, the database includes genes of women, too.
Click the little orange rectangle labeled OMIM, at the top
of the page.
This OMIM entry tells you about this gene and
colorblindness, a genetic disorder associated with mutations in this
gene. Read as much as your interest dictates. Follow links to other
information. For more information about OMIM itself, click the
OMIM logo at the top of the page. Once you've satisfied your
appetite, return to the LocusLink page (use the Back
button of your browser or your browser's history list -- if you're
lost, click HERE).
Click the red PUB rectangle at the top of the page.
You have entered PubMed, a free database of scientific
literature, to a list of articles directly associated with this gene
locus. By clicking on the authors of each article, you can see
abstracts of the article. If you are on a university campus where
there is online access to specific journals, you might also see links
to full articles. PubMed is your entry point to a wide variety
of scientfic literature in the life sciences. On the left side of any
PubMed page, you will find links to a description of the
database, help, and tutorials on searching. Read the abstract of the
article by Nathans and co-workers before returning to
LocusLink.
NB to GR: Add some guided searches in PubMed.
What is the nucleotide sequence of this gene?
Remember that we are looking at the gene for the red-sensitive
opsin in humna vision, and it's located near the bottom tip of the X
chromosome. Scroll down to NCBI Reference Sequences
(RefSeq). You see that mRNA (messenger RNA) and protein
sequences are available, along with a GenBank sequence.
Click the entry number beside mRNA.
This is a typical GenBank nucleotide file, and a lot
of it is hard to read, but a few things are clear. First note, under
references, a citations to the publication of this sequence in the
scientific literature. To see an abstract of the article in which
this gene was described, click the PubMed link below the
reference. As you see, you've been here before. There are many ways
to move from one database to another, which is both a blessing and a
curse. You have to keep your eyes open for useful links, and when
you find a path that you think you might use again, make a note of it
and bookmark the web pages. It is frustrating to know there's an
easier way to do something, and not remember how you did it.
NB to GR: point back to this abstract
when you get the phylogenetic tree.
Scroll to the bottom of this long page. The last thing is the
sequence of this messenger RNA. You are seeing the actual list of As,
Ts, Gs, and Cs that make up the message for synthesis of this opsin.
But wait! You know that RNA contains no T. In most nucleotide
databases, U from RNA is represented as T, to make for easy
comparison of DNA and RNA sequences. This sequence information is
not in the form that is most useful for searching in
databases, say, searching for related genes. Let's display this entry
in a form more useful for searching.
At the top of the page, beside the Display button, pull
down the menu that says default (we are looking at the default
entry display), and select FASTA (note that several other
display options are available). Then click the Display button.
You see one descriptive or "comment" line that begins with ">",
followed by the nucleotide sequence. This little file is just what
you need to search nucleotide databases for similar sequences. Let's
keep it for future use.
Click and drag on the web page to select everything from the
">" through the last nucleotide. Be careful not to select anything
else. From your browser's Edit menu, select Copy to
make a copy of this information on your clipboard, for pasting
elsewhere. Now start your favorite word processor, make a new
document, and paste. The FASTA comment and sequence should appear.
Select all of the text and change the font to Courier or Monaco --
these "typewriter" fonts make it easy to align letters into columns,
because all letter are the same width. Save this file, choosing
text or plain text as the file type. Call it
mrnared.txt. Save it to a convenient location for the files
you'll be making later. Click your browser's Back button until
you return to LocusLink.
What is the amino-acid sequence of this gene?
Click the entry number beside Protein.
Things look a lot like before, but this is a protein entry,
containing the amino-acid sequence in one-letter abbreviations. Just
as with the mRNA entry, turn this into a FASTA display, and copy it
into a new word-processor document. Save it in text format as
protred.txt. Return to LocusLink.
What does the neighborhood of this gene look like?
Click the entry number beside GenBank.
Display the entry in default format. This entry shows the
sequence of the specific DNA clone that contains the opsin gene,
along with information about how this clone was produced. This entry
thus shows the gene in the slightly larger context of the cloned
fragment in which this gene was found. This sequence would allow you
to see flanking regions around the gene, and perhaps to design PCR
primers for making useful quantities of the nucleotide sequence so
you could express this gene in a cloning vector. From this page, you
could also find neighboring sequences if you wanted to look farther
afield. As before, display this entry in FASTA format, and save it as
a word processor text document entitled GBred.txt.
What proteins in humans are similar to the red opsin?
Now return to the NCBI
Map Viewer. We're going to search the human genome for
sequences similar to that of the red opsin.
Click the B next to Homo sapiens (human).
This is the NCBI's BLAST search tool. BLAST is a widely used
program for finding sequences similar to a "query" sequence that
you're interest in. Pick these options from the various menus:
- Database: Protein (Search the database of proteins
sequences.)
- Program: blastp (Use the version of BLAST that compares
protein sequences, unlike blastn, which compares
nucleotide sequences.)
- Other Parameters, Expect: 10 (The higher the number, the less
stringent that matching, and the more hits you'll get)
Next, copy the FASTA data from your file protred.txt to
your clipboard, and paste it into the BLAST search box, above which
it says, "Enter an accession..." Check to be sure that the first
character in the box is the ">" at the beginning of the FASTA
data. Then click Begin Search.
The next page is for formatting your search results. Just click
that enthusiastic Format! button. When your results are ready,
the results of BLAST page appears. Look down the page to the
graphical display, a box containing lots of colored lines. Each line
represents a hit from your blast search. If you pass your mouse
cursor over a red line, the narrow box just above the box gives a
brief description of the hit. You'll find that the first hit is your
red opsin. That's encouraging, because the best match should be to
the query sequence itself, and you got this sequence from that gene
entry. The second hit is the green opsin -- remember that the PubMed
entry reported that the red and green pigments are the most similar.
The third and fourth hits are the blue opsin and the rod-cell pigment
rhodopsin. Other hits have lower numbers of matching residues, and
are color coded according to a score of matches. If you click on any
of the colored lines, you'll skip down to more information about that
hit, and you can see how much similarity each one has to the red
opsin, your original query sequence. As you go down the list, each
succeeding sequence has less in common with red opsin. Each sequence
is shown in comparison with red opsin in what is called a pairwise
sequence alignment. Later, you'll make multiple sequence
alignments from which you can discern relationships among
genes.
See what you can figure out about what the scores mean.
Identities are residues that are identical in the hit and the
query (red opsin), when the twoo are optimally aligned..
Positives are residues that are very similar to each other
(see residue number 1 in the blue opsin -- it's threonine in red
opsin, and the very similar serine in the blue). Gaps are
sometimes introduced into a hit to improve its alignment with the
query. The more identities and positives, and the fewer gaps, the
higher the score. Note that blue opsin and rhodopsin are only about
45% identical to the red opsin. Other proteins, which are apparently
not visual pigments, have even lower scores. Now let's take a look at
where all these hits are in the human genome.
Where are all the genes for these other proteins?
Click the Genome View button near just below the
introductory information at the top of this result page.
You have come full circle. You are back that the human chromosome
diagram, and all the hits of your search, in the colors that signify
their BLAST scores, are located for you on the diagram. Notice that
there are about 100 proteins (discovered so far, that is) that have
40% or more positives in alignment with red opsin. The opsins are
members of the very large family of G
protein-coupled receptors, key players in signal
transduction.
How are the opsin genes related to each other?
Answering this question requires making a multiple sequence
alignment and then using it to make a phylogenetic tree.
For these tasks, we move to another database where it's a little
easier to gather a bunch of sequences into a single FASTA file.
Point your browser to http://us.expasy.org.
ExPASY is mirrored at several locations including the following: http://www.expasy.org,
http://ca.expasy.org. If one does not work
or responds slow, try a different one.
You see the home page of ExPASy, the Expert Protein
Analysis System. As I said earlier, ExPASy is a complete protein tool
box. With ExPASy, you can do almost any imaginable analysis or
comparison of protein sequences and structures.
Click Swiss-Prot and TrEMBL under Databases.
Read the introduction to these databases. They are high quality
protein sequence databases with abundant annotation, minimal
redundancy, and many connections to other databases.
Click Advanced search in Swiss-Prot and TrEMBL.
With advance searching, you can limit your search to specific
genes and organisms, and you can search on descriptive information in
the entries
Set up a search for human opsins, as follows:
- Search Swiss-Prot only.
- Enter Description: opsin
- Organism: Choose "Human" from the pull-down menu
- Check "Append and prefix * to query terms. The * is a "wild
card". You are searching for all entries that contain "opsin" as a
whole or partial word.
Click Submit.
The page Swiss-Prot description is your search result
page.
Look over the results. On 9/8/2003, this search gave 14 hits. The
rod pigment rhodopsin (OPSD), along with the three cone
pigments (OPSB, OPSG, OPSR). There is also a
"visual pigment-like receptor peropsin", OPSX. Sound
mysterious. Let's find out more about it, and in the process, see a
typical Swiss-Prot entry.
Click on the gene name, OPSX.
You see the NiceProt View of Swiss-Prot: O14718. Persue
this entry and try to find out just what this rhodopsin-like protein
is thought to do. Under Comments, you'll learn that it's found
in the retina (the RPE or retinal pigment epithelium), and that it
may detect light, or perhaps monitors levels of retinoids, the
general class of compounds that are the actual light absorbers in
opsins. Also under Comments - Similarity, you see, as
mentioned earlier, that this protein is a member of the large family
of G protein-coupled receptors. If you click "G protein-coupled
receptors" under the Keywords, you find a list of all purported 7-transmembrane
receptor
proteins in SwissProt. The human genome alone contains 350 of
them! See if you can verify this statement, without counting. Now
back up to the NiceProt view.
Under References click the journal citation, "Proc. Natl.
Acad. Sci. U.S.A. 94:9893-9898(1997). From the resulting page, you
can read a full article in the Journal of the National Academy of
Sciences (PNAS) about this protein. Like many journals, PNAS puts
full articles online just 6 to 12 months after publication.
Looking further down the page, you find cross-references to the
protein or its gene in other databases, predicted structural features
of the protein, and last, the sequence. Note also, at the bottom of
the page, links to a number of ExPASy tools listed for further
analysis of this sequence. Try some of them. For example, I just
learned in about ten seconds from Compute pI/MW that the
isoelectric pH (or pI) of this protein is 8.78. And I learned in no
time at all from ScanProSite that the sequence contains
signatures indicating that the protein is probably a G
protein-coupled receptor (no surprise, but comforting) and that it
has a retinal binding site. ProSite is a tool for finding
signatures of function in new sequences.When you finish playing with
these powerful tools, return to your SwissProt search results
by use of the back button of your browser. If you're lost, go
back to ExPASy and do the search again.
Now let's compare the sequences with each other. We'll use the
program ClustalW to make a multiple sequence alignment.
Scroll down the result page and check the boxes at the left of
these entries
- OPSB (blue-sensitive opsin)
- OPSD (rhodopsin)
- OPSG (green-sensitive opsin)
- OPSR (red-sensitive opsin)
- OPSX (visual pigment-like receptor opsin)
At the top of the page, at Send selected sequences to,
select Clustal W (multiple alignment) from the menu, and click
Submit.
ClustalW has been implemented at many web sites. This one,
at EMBnet.org,
automatically receives the FASTA files from the selected entries,
allows you to make some settings of the alignment criteria, and then
does the alignment. We will just accept the default alignment
settings. First, scroll in the Input Sequences box and verify
that it contains five FASTA files, one right after the other. To make
them easier to identify in subsequent outputs, edit the name of each
FASTA comment line (begins with ">") as follows:
- Change "sp|P03999|OPSB_HUMAN Blue-sensitive opsin (Blue cone
photoreceptor pigment) - Homo sapiens (Human)." to "Blue".
- Change "sp|P08100|OPSD_HUMAN Rhodopsin (Opsin 2) - Homo
sapiens (Human)." to "Rhodopsin".
- Change "sp|P04001|OPSG_HUMAN Green-sensitive opsin (Green cone
photoreceptor pigment) - Homo sapiens (Human)." to "Green".
- Change "sp|P04000|OPSR_HUMAN Red-sensitive opsin (Red cone
photoreceptor pigment) - Homo sapiens (Human)." to "Red".
- Change "sp|O14718|OPSX_HUMAN Visual pigment-like receptor
peropsin - Homo sapiens (Human)." to "Peropsin".
In all cases, be sure to leave the ">" in the first line of
each FASTA entry. To save some work in case something goes wrong,
select the edited contents of the Input Sequences box, copy
it, and paste it onto an empty word-processor page, and save the file
in text format. Name it Opsins.txt.
Click Run ClustalW.
The resulting page is called ClustalW query receipt, and it
contains links to several output files.
Click clustalw (aln).
You see the typical ClustalW alignment file, showing our
five protein sequences aligned to maximize identical and similar
residues. Below each line of five sequences are symbols to show the
extent of similarity among the sequences. An asterisk (*) means that
the same residue is always (that is, for all of these
sequences) found at that location; for example, the first asterisk
marks a location where only N (asparagine) is found. Colon (:) means
that all residues at this location are very similar; for example, the
first colon is where only F (phenylaline), I (isoleucine), and L
(leucine) -- residues with large, nonpolar sidechains -- occur.
Period (.) means somewhat similar residues; for example, at the first
period, serine, threonine, and glutamine occur -- all polar, but
varied in size. If there is no mark then the residues at that
location display no predominant common properties.
Once more, as a safety measure, copy this alignment to your
clipboard, and paste it onto an empty word-processor page. Then save
the file in text format. Name it OpsMSA.txt. Remember that it
is still on your clipboard, for pasting at our next stop. This
multiple sequence alignment is one type of input you can use to make
a phylogenetic tree.
What does the family tree of human opsins look like?
Point your browser to http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html
This is one home of the program Phylip, One of the most rigorous
tools for constructing phylogentic trees from aligned sequences.
Under Proteins, next to protdist, click "advanced form."
You are about to run protdist, a program that computes the
"distance" of sequences from each other. These so-called distance
matrices will be used by Phylip to construct your
tree.
Enter your email into the top box.
In the alignment file box, paste your mutiple sequence alignment
from ClustalW.
Click "Bootstrap Options" and make these settings:
- Check the box for "Perform a bootstrap before analysis"
- Enter an odd number for a seed
- Enter 100 replicates
At the top of the page, click "Run protdist".
protdist constructs distance matrices by a process called
"bootstrapping". Bootstrapping is a bias-reducing procedure in which
protdist builds an alignment of pseudosequences by picking
residue positions at random and stringing the residues at those
positions together until the sequence is the same length as the
original ClustalW alignment. From this pseudosequence alignment,
protdist determines the relative number of sequence difference
among the five proteins, as determined from a random sampling of
their sequences. The result of the process is a called distance
matrix, and you will see it soon. This process is repeated, 100 times
in our case, to make 100 distance matrices. The tree we will
ultimately produce represents a consensus of the 100 matrices.
There may be a delay of a few minutes before the result pagee
appears. If the server is busy, you may be informed that results are
being sent by email. If so, check you email in two or three minutes.
You will receive five messages, the first one simply containing a
link to your result page. Click the URL, or paste it into your
browser and press <return> to open the page.
On the Phylip: protdist page that results, click
outfile to see the output from protdist. The file
contains 100 matrices containing numbers that represent the relative
number of differences among the five sequences. Each matrix has the
sequence names in the first column, and you should imagine that these
sequence names are also the headings for the remaining columns. The
number at the intersection of the row Blue and the column with the
imaginary heading Peropsin gives the relative magnitude of the
sequence differences between the blue opsin and peropsin. The
matrices have zeros on the diagonal because each pseudosequence is
identical to itself.
Click the Back button of your browser to return to the
Phylip: protdist page.
On the first pull-down menu of the Phylip: protdist page,
pick "neighbor." Read the menu carefully: don't pick
"weighbor".
Click "Run the selected program on outfile" to run Phylip with the
output file of matrices you just examined. You are running a
procedure called "neighbor joining" to construct an evolutionary
tree.
On the Phylip: neighbor page that appears next, beside
"Distance method?" Make sure "Neighbor-joining" is selected.
Click "Bootstrap options" and make these settings:
- Check "Analyze multiple data sets (M)"
- Enter 100 data sets (same as number of replicates from
protdist)
- Enter an odd number for a seed
- Check "Compute a consensus tree"
Scroll down to "Other options".
This entry area gives you the optin of designating an
outgroup for the root of your tree. An outgroup is the
sequence you think is most distant from the others, possibly the
commn ancestor of all. We don't know that in this case, so leave the
default of 1.
At the top of the page, click "Run neighbor".
The resulting files are
outfile.consense -- your tree, in a text file, and
outtree.consense -- your tree in a format used by
tree-printing programs.
Click on outfile.consense to see the tree.
Scroll down to the bottom of this file to see the consensus tree.
This tree is "unrooted", meaning that we do not know the ancestor of
all these sequences. We learn from this tree which sequences are most
alike and which are most different. We also learn how often the
connections of this tree were made the same way in the 100 trees made
from those 100 difference matrices. The numbers on the branches
indicate the number of times that partition of the species into the
two sets separated by that branch occurred among the 100 trees. For
example, the separation of Red and Green from the other three,
indicating that Red and Green are more similar to each other than to
the other three, occurred in all 100 trees. The separation of Blue
and Peropsin from the other three occurred in only 82 of the 100
trees. In the other 18 trees, Rhodopsin and Peropsin were separated
from the other three. (Can you extract this information from this
file?) In the tree branching shown, the majority rules, and the
results of 18 of the trees are discarded.
You can save this file by selecting all and pasting it into a
word-processor document. Call it outfile_consense.txt.
Return to the Phylip: neighbor page and click on
outtree.consense. This is your tree in Newick format, which is
widely used by tree-printing programs like Phylodendron. Let's
use this program to give us a tree in attractive graphics, rather
than text.
Point your browser to www.es.embnet.org/Doc/phylodendron/treeprint-form.html.
Paste the contents of your outtree.consense file into the
Tree Data box. Select Phenogram from among the Tree Styles.
From the menu at Extra Options, Output, select GIF
format for your output file. Give your tree a title, such as "Human
Visual Opsins and Opsin-Like Proteins". Finally, click
Submit.
Your GIF-format tree appears in your browser window. To keep it,
chose Save As ... from the File menu. Call the file
OpsinTree.gif. My tree looks like this:
What is the structure of an opsin?
By now, I'm particularly curious about peropsin, but it's not
likely that the structure of a recently discovered protein of unknown
function has been determined. But it is likely that all opsins
are similar in structure, so let's see is we can find an opsin in
the database for macromolecular structures, the Protein
Data Bank (PDB). It will give us an idea of what kind of
thing an opsin is.
In fact, the PDB does not contain molecular structures at all. It
is better to say that it contains models of macromolecules.
These models are interpretations of data from one of the two main
methods of macromolecular structure determination: x-ray
crystallography and NMR
spectroscopy. When researchers determine the structure of a
macromolecule, they deposit a file containing the three-dimensional
coordinates of all the atoms in the model. This coordinate
file -- along with an online molecular graphics tool (like **) or
a computer graphics program like Deep
View -- are all that you need to see and study the molecule
on your computer. Next we will retrieve a model from the PDB and view
it with an online graphics tool. We'll also visit the home of a
topnotch computer graphics program that you can download FREE and use
on your home computer.
Point your browser to http://www.rcsb.org/pdb/.
The PDB home page contains a simple search box under Search the
Archive. You can search for models using simple keywords or PDB
ID codes. An PDB code has four characters, like 1CYO. How
would you ever know a model by its code? When a new structure is
published, the authors usually give the PDB code in the last
reference of the bibiography. With that code, you can go straight to
the model you want to see. But more often, your question, like ours
is more general. For such cases, PDB also provides forms for more
sophisticated searches. For now, let's just see if any opsin models
are availalble. Type "opsin" into the search box and click Find a
structure.
As of 9/8/2003, this search returns 95 models (on 3/30/2003, it returned 88
models), and you can see
from the first one that our search is too broad. Among other things,
we're finding netropsin, an antitumor drug. There's also
bacteriorhodopsin, and the last time I looked, bacteria had no eyes,
so this is not likely to be a visual pigment. Looking over the first
two pages of hits, I see one promising sign: some entries for bovine
rhodopsin. Some of the hits appear to be fragments of this molecule.
So let's use a more precise search tool to see if other bovine
rhodopsin models are available. Return to PDB home.
In the Search the Archive box, click
SearchFields.
We have gone from the simplest to the most sophisticated search
tool. SearchFields is a customizable form that allows many
search criteria. The criteria names are links to the definitions of
the criteria, providing information on the contents of PDB files and
the criteria that will look in specific parts of the files. At the
bottom of the form are criteria you can add to the form. Then you can
bookmark a form and always find it with the criteria you want. Now
let's get serious and see if there are PDB models that are similar to
human rhodopsin.
Scroll down to the list of criteria you can add to the form. Check
to add these criteria: FASTA search, Ligand and prosthetic
groups, and Source. Click New Form. It looks like
you have come back to the same page, but now the new search criteria
are available. You can now search with a FASTA sequence, you can
limit the search to models contains specific nonprotein ligands (like
retinal, the prosthetic group of visual pigments), and you can
specify the source organism from which the macromolecule is
obtained.
Find your FASTA sequence of human rhodopsin and paste it into the
FASTA Search box. To limit your search to models containing
the visual prosthetic group, type "retinal" into the Ligand and
Prosthetic groups box. Click Search.
This search may take a few minutes. The tool is looking for
sequence homology among more than 20,000 entries in the PDB. On
9/8/2003, I got only 5 hits for this search. The first one had PDB
code 1LN6, and was listed as a model of bovine rhodopsin. If
your search produces other hits, find 1LN6 among them.
Beside FASTA result, you see the number 8.8e-155. This is
an alignment score meaning that the probability that this entry and
the human rhodopsin sequence are similar just by chance is 8.8 x
10-155; not bloody likely, in other words.
Click alignment. In a new browser window, you see
the alignment between the human rhodopsin sequence and that of
1LN6. After alignment, they are over 90% identical. If two
proteins are more than about 40% identical, they are almost certain
to be practically identical in structure. So this model will show us
what human rhodopsin looks like. Close the alignment window to reveal
again the search results.
Things to come in the future:
-- go the visualization page and use QuickPDB to see the
structure
-- then download and examine with Deep View (link to tutorial),
noting covalent link to retinal
-- then examine peropsin: does it have a retinal binding site? use
PROSITE within Deep View
-- then then make a homology model of peropsin using
SwissMODEL
-- that should do it.
|