informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics
 

Offerings, by Semester

Training
 
 
 
 
ENTREZ: The NCBI Interface

One Interface, Many Databases

Literature and Data

Over the last several years, NLM and NCBI have worked to use a single retrieval tool for most of their databases as a common interface. At the NLM this interface is branded as "PubMed,"NLM Catalog," "Journals," etc.; At NCBI it's called the "Entrez Protein," Entrez Nucleotide," etc., depending on which NCBI database you search.. You have probably used that browser when you've looked up articles using PubMed. A common tool for diverse types of data shows how closely integrated data needs to become. We'll discuss that point in more detail in a later module in our course, Information Integration.

pubmed

When we want to discover information about a protein sequence, a DNA sequence, a gene annotation, or references to a human disease, a similar interface is available.

entrez protein


Evolution and Taxonomy

You are probably not surprised that an agency tasked with holding important molecular data has both protein and nucleic acid databases. But since the unifying lens through which we understand the relationships of sequences is evolution, we also need to understand the relationship of the sequences to taxonomy. The TaxBrowser links all records to the organism type from which the sample was taken.

tax browser

 

Beginning with the TaxBrowser and choosing your species, you'll be able to find nucleic acid sequences, protein sequences, or genes and more in a very specific way, because taxonomy is a structured vocabulary carefully curated at NCBI.

Note that there are Direct links and Subtree links. That's because all taxonomies oare trees. The example at right taken today for Homo sapiens includes a few more nucleotide and protein links in Subtree rather than Direct. Why? NCBI has a few sequence isolated from fossil materials of our distant cousin, Homo sapiens neanderalthenesis.

tax

Entrez Gene

.Over the last few years, NCBI has developed the Entrez Gene database as a central focus uniting sequence, genomic, and functional information. You will use it later in this exercise and in the course. At the NCBI Web Site, click on the Direct Link column on the Gene row; it should look like the image, above. That should lead you the EntrezGene page for the 36,000+ Current genes they are holding.

current genes

Note that there are many more genes in the database. As new information is gathered, old gene designations are retired. Why keep old data? To interpret older papers!

If at the NCBI EntrezGene page for Homo sapiens, you add to the query the Boolian Operator AND and the key word p53, you'll find that the first item returned is the gene we're interested in. Click on that item, and you'll get to the TP53 Tumor protein p53 Gene page which links to a rich variety of different specific NCBI resources about this gene.

gene

 


Federated Search - All Databases

The Entrez engine is so popular, you can scan all of NCBI's databases at once with a single page.

federated


Any of these searches is looking through a database record. To understand the power and limitations of these searches, it's useful to examin a sample NCBI Record.

Page last updated January 23, 2008

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo