informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics
 

Offerings, by Semester

Training
 
 
 
 
EMBOSS-Explorer: Working with Sequences

[ Previous | Up | Next ]

Since EMBOSS is a sequence analysis package, you need to understand how to use sequences within EMBOSS.

The EMBOSS programs can read sequences from various sequence databases if the sequence is referred to in the form database:entry. This format is known as a USA (Uniform Sequence Address). You can see the databases we have set up for you using the program showdb.

However, since databases are constantly changing and are fairly large, in our UMDNJ installation we leave it up to you to download the sequences from NCBI, EMBL, Swissprot, or any other database that contains the sequences you want to work with. You store the sequences and results on your own computer, so it is important that you must backup your data!

In much of this workshop, we're going to look at members of the platelet-derived growth factor receptors or the globin protein. The general principles are, of course, applicable to any sequences you would like to analyze.


Exercise 2 - infoseq

A) Within EMBOSS, the program infoseq is a small utility to list the sequences' USA, name, accession number, type (nucleic or protein), length, percentage G+C (for nucleic), and/or description.

There are other applications within EMBOSS to analyze your sequence. Some include the applications under the headings "Nucleic 2D Structure", "Nucleic Codon Usage", "Nucliec Composition".

How many applications are listed under each of those headings "Nucleic 2D Structure", "Nucleic Codon Usage", "Nucliec Composition".?

B) Download these two globin sequences, taken for this exercise from the NCBI. NM_000518 (mRNA) and HS_11 (a genomic sequence) from NCBI in FASTA format. You could have gotten these sequences yourselves, but by giving them to you we all start witht he same information. These links will probably open in a browser window. Copy and paste the sequences into a text editor, like Windows Notepad.

Note: Do not use wordprocessors to manipulate sequence data to be used in sequence analysis. Wordprocessing programs (like MS Word) have a lot of hidden code to format text. We don't see that but computers do and they don't understand that code. Be warned.

Navigate in EMBOSS-Explorer to infoseq and paste in the NM_000518 sequence. By default the options indicate NOT to print anything out. Change the options to print out the name, accession, type, length, GC content, and description.

What type of sequence is this sequence?
What is the length of this sequence?
What is the % GC-content of this sequence?

[ Previous | Up | Next ]

Page last modified September 29, 2008

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo