informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics
 

Offerings, by Semester

Training
 
 
 
 
Introduction to NCBI Resources

Introduction

In this workshop you will:

  • Learn a about resources describing the biology and molecular genetics of an important gene, p53, its protein and related proteins.
  • discover some tools that NCBI provides to study these genes in the context of their genomes
  • look into the variation reported for these proteins

This tutorial presumes that you are comfortable with basic biochemistry and molecular biology.

The Institution

The National Center for Biotechnology Information was established in 1988. It creates public databases and develops tools to access these data. Take a look at what NCBI says about itself and what it says bioinformatics is all about.

Databases

Nucleic Acid Sequences

NCBI is the National Center for Biotechnology Information. It is one of three cooperating international organizations that houses DNA and RNA sequence information. At NCBI, that sequence information is called GenBank. GenBank contains all publicly available sequences, with annotations. The exact same DNA sequence content as can also be found at EMBL (European Molecular Biology Laboratory) and the DDBJ (DNA Data Bank of Japan)

Even though the sequences are the same, these three major organizations offer a variety of ways to access their data and a set of tools to search and display information.

Protein Sequences

The main data business at NCBI is housing and organizing DNA sequences. The protein entries maintained at NCBI are compiled from a variety of sources, including protein translations of cDNA submissions to NCBI and protein sequences from four independent databases SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq.

Bioinformatics is all about integrating information, so NCBI includes these data in its suite of offerings.

Other Data Sets

NCBI has taken on the responsibility of linking and distributing many kinds of data, a few of which include:

  • Literature
    • PubMed, the National Library of Medicine's comprehensive database of citations in the biomedical field
    • Once a famous medical genetics reference book, the Online Mendelian Inheritance in Man (OMIM) is now available only on line
    • A newer companion reference, Online Mendelian Inheritance in Animals (OMIA) was introduced in the last couple of years.
    • Books, a bookshelf of useful, if not entirely current references and texts
  • Structures
    • The Molecular Modeling Database (MMDB) of 3D protein structures is an NCBI-curated collection of 3D structures derived from the Protein Database (PDB)
  • Data collections: sets of annotated data
    • RefSeq, an annotated subset of GenBank
    • Gene, a compendium of gene-centered information about both sequence, genomic organization and function
    • TaxBrowser, a tool to examine NCBI's taxonomy database and link to sequences, structures, and assemblies of data associated with exact taxa.

Access to all of these data (and more) is provided through the Entrez Browser, the search engine NCBI and NLM provides us to probe the literature and data they hours. Next we'll examine that search engine.

Entrez Browser

Page last updated January 23, 2008

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo