One Interface, Many Databases
Literature and Data
Over the last several years, NLM and NCBI have worked to use a single retrieval tool for most
of their databases as a common interface. At the NLM this interface is branded as "PubMed,"NLM Catalog," "Journals," etc.; At NCBI it's called
the "Entrez Protein," Entrez Nucleotide," etc., depending on which NCBI database you search.. You have probably used that browser when you've
looked up articles using PubMed. A common tool for diverse types of data shows how closely integrated data needs to become. We'll discuss that point in more detail in a later module in our course, Information Integration.

When we want to discover information about a protein
sequence, a DNA sequence, a gene annotation, or references to a
human disease, a similar interface is available.

Evolution and Taxonomy
You are probably not surprised that an agency tasked with holding important molecular data has both protein and nucleic acid databases. But since the unifying lens through which we understand the relationships of sequences is evolution, we also need to understand the relationship of the sequences to taxonomy. The TaxBrowser links all records to the organism type from which the sample was taken.

Beginning with the TaxBrowser and choosing your species, you'll be able to find nucleic acid sequences, protein sequences, or genes and more in a very specific way, because taxonomy is a structured vocabulary carefully curated at NCBI.
Note that there are Direct links and Subtree links. That's because all taxonomies oare trees. The example at right taken today for Homo sapiens includes a few more nucleotide and protein links in Subtree rather than Direct. Why? NCBI has a few sequence isolated from fossil materials of our distant cousin, Homo sapiens neanderalthenesis.
|
 |
Entrez Gene
.Over the last few years, NCBI has developed the Entrez Gene database as a central focus uniting sequence, genomic, and functional information. You will use it later in this exercise and in the course. At the NCBI Web Site, click on the Direct Link column on the Gene row; it should look like the image, above. That should lead you the EntrezGene page for the 36,000+ Current genes they are holding.

Note that there are many more genes in the database. As new information is gathered, old gene designations are retired. Why keep old data? To interpret older papers!
If at the NCBI EntrezGene page for Homo sapiens, you add to the query the Boolian Operator AND and the key word p53, you'll find that the first item returned is the gene we're interested in. Click on that item, and you'll get to the TP53 Tumor protein p53 Gene page which links to a rich variety of different specific NCBI resources about this gene.

Federated Search - All Databases
The Entrez engine is so popular, you can scan all of NCBI's databases
at once with a single page.

Any of these searches is looking through a database record. To understand the power and limitations of these searches, it's useful to examin a sample NCBI Record.
Page last updated
January 23, 2008
|