informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics
 

Offerings, by Semester

Training
 
 
 
 
Introduction to NCBI: The Nucleic Acid Record

What is in an NCBI record?

The record you "see" when you retrieve an NCBI sequence is derived from a highly structured set of files. Understanding the structure of those files can assist in your finding the records you want.

Searching for the Record

Give yourself the experience of searching in different ways. In the examples below, I italicise the term you should use to search.

  • Search for Nucleic Acids corresponding to p53.
    • What appears to be the difference between the three kinds of records? The example below is not from a p53 search.
    • nuc types
    • How many do you find? (The total should be a bit over 11,000)
    • From the brief description, can you be sure that all of these sequences for the p53 gene?
    • If not, w hy did they appear in your search?
      • The first sequence you will probably have found is NM_015328. Look at that Nucleic Acid record. Where does the term p53 appear on that page? Do you understand why a search for p53 may not be specific for just p53 genes?
  • Search for p53 AND Homo sapiens in the Nucleic Acids database.
    • How many records do you find?
    • Look at the tabs
      • tabs
    • Did this strategy to search for human p53 sequences work very well?

The Record

Each Nucleic Acid record has a set structure of defined data fields. The data fields can contain information from a limited, formal vocabulary or be composed of free text and data, text and data that may include variations in English spelling (haemoglobin vs. hemoglobin, for example), spelling errors or even data errors.

To understand what elements make up a Genbank/NCBI record, take a look at this interactive sample record at NCBI. In particular, understand the following elements of the record by clicking through the words you see on the NCBI page linked above.

record 1 rec 2

 

Of course the best way to understand an environment is to work in it. The next set of exercises will help us understand the extent of information we can derive from NCBI, using the protein p53 as an example.

NEXT: Exercise

Page last updated January 23, 2008

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo