|
[ Previous | Up | Next ]
Since EMBOSS is a sequence analysis package, you need to understand
how to use sequences within EMBOSS.
The EMBOSS programs can read sequences from various sequence databases
if the sequence is referred to in the form database:entry. This
format is known as a USA (Uniform Sequence Address). You can see the
databases we have set up for you using the program showdb.
However,
since databases are constantly changing and are fairly large,
in our UMDNJ installation we leave it up to you to download the sequences from NCBI, EMBL, Swissprot,
or any other database that contains the sequences you want to work
with. You store the sequences and results on your own computer, so it is important that
you must backup your data!
In much of this workshop, we're going to look at members of the platelet-derived growth factor receptors or the globin protein. The general principles are,
of course, applicable to any sequences you would like to analyze.
Exercise 2 - infoseq
A) Within EMBOSS, the program infoseq is a small utility to list the
sequences' USA, name, accession number, type (nucleic or protein),
length, percentage G+C (for nucleic), and/or description.
There are other applications within EMBOSS to analyze your sequence.
Some include the applications under the headings "Nucleic 2D Structure", "Nucleic
Codon Usage", "Nucliec Composition".
How many applications are listed under each of those headings "Nucleic
2D Structure", "Nucleic Codon Usage", "Nucliec
Composition".?
B) Download these two globin sequences, taken for this exercise from the NCBI. NM_000518 (mRNA) and HS_11 (a genomic sequence) from NCBI in FASTA format. You could have gotten these sequences yourselves, but by giving them to you we all start witht he same information. These links will probably open in a browser window. Copy and paste the sequences into a text editor, like Windows Notepad.
Note: Do not use wordprocessors to manipulate sequence data to be used in sequence analysis. Wordprocessing programs (like MS Word) have a lot of hidden code to format text. We don't see that but computers do and they don't understand that code. Be warned.
Navigate in EMBOSS-Explorer to infoseq and paste in the NM_000518 sequence.
By default the options indicate NOT to print anything out. Change
the options to print out the name, accession, type, length, GC
content, and description.
What type of sequence is this sequence?
What is the length of this sequence?
What is the % GC-content of this sequence?
[ Previous | Up | Next ]
Page last modified
September 29, 2008
|