informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics
 

Offerings, by Semester

Training
 
 
 
 
Introduction to NCBI: Databases

Nucleic Acid Sequence Databases Are International

NCBI, the National Center for Biotechnology Information, is the major national US institution charged with keeping public stores of nucleic acid databases. It is one of three cooperating international organizations that houses DNA and RNA sequence information.

After looking quickly at the the questions below, take ten minutes to browse through the European Molecular Biology Consortium, EMBL, and the DNA Data Bank of Japan (DDBJ)

 

 

Question for Discussion in Class

  • Take a few moments to look at the top pages of the Japaneese and European institutions and compare them to NCBI.
    • How does the orginizational structure of EMBL differ from NCBI ?
  • Scientists submit their own information to NCBI. Review the procees of submitting to Genbank.
    • What do you think of the process?
    • What are the advantages and disadvantages of this kind of process?

More Biological Data: Protein Sequences, Structures and Functions

The primary data collection and analysis job at NCBI has traditionally been directed at DNA sequences. But NCBI has taken on the responsibility of distributing other kinds of data, since bioinformatics is all about integrating information.

The protein sequence entries maintained at NCBI are compiled from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank . Data describing three dimensional structures (mostly of proteins) is included in NCBI's Molecular Modeling Database (MMDB) and is derived from PDB.

One can think of a number of ways in which proteins function, from the biolochemical, to the cellular to physiology and medicine. NCBI is the home to Online Mendelian Inheritance in Man (OMIM), the most comprehensive link between genes and disease.

NCBI and Genetics

DNA and Protein resources are associated with Unique Human Gene Sequence Collection (UniGene), and references for genomic biology including gene maps (the position of genes on chromosomes) for a number of entire genomes, including man. Entrez Gene is among the best sources for comprehensive, gene-centered information.

Finally, because sequence divergence for proteins and nucleic acids is tied to tightly tied to evolution, taxonomy data is also housed at NCBI and taxonomic distribution of data is tightly tied to that resource.

Review in Class:

  1. What is the main function of the Protein Data Base, PDB?
  2. What universities participate in running PDB?
  3. Who runs SwissProt? What do they see their main job to be?
  4. How do I use the Tax Browser?

Too Much Data: Where Do You Start?

Scientists inside and outside of the international databases continue to realize the value of open resources such as GenBank as well as highly curated databases like PDB and SwissProt. In an attempt to create nucleic acid and protein sequence databases that could be used as bases for comparison, NCBI created RefSeq. Look at the questions below and then take a look at RefSeq. You will find the FAQ particularly helpful.

Review in Class:

  1. What is the difference between a RefSeq record and a common GenBank record?
  2. Why don't all genes or proteins have a RefSeq that we can use?

NEXT: The ENTREZ Interface

Page last updated September 10, 2006

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo