informatics institute logo Informatics Institute UMDNJ logo
Bioinformatics
 

Offerings, by Semester

Training
 
 
 
 
EMBOSS-Explorer: Multiple Sequence Analysis

[ Previous | Up | Next ]

The simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology. Multiple alignments are used to find diagnostic patterns to characterize protein families; to detect or demonstrate homology between new sequences and existing families of sequences; to help predict the secondary and tertiary structures of the new sequences; to suggest oligonucleotide primers for PCR; and as an essential prelude to molecular evolutionary analysis.

One of the most popular programs for performing multiple sequence alignments is clustalw. EMBOSS has an interface to clustal called emma clustal (and thus emma) creates a multiple sequence alignment from a group of related sequences using progressive pairwise alignments. It can also produce a dendogram showing the clustering relationships used to create the alignment. The alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster can then be aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences can be aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved by a series of progressive, pairwise alignments that include increasingly dissimilar sequences and clusters, until all sequences have been included in the final pairwise alignment. When gaps are inserted into a sequence to produce an alignment, they are inserted at the same position in all the sequences of the cluster. Each pairwise alignment uses the method of Needleman and Wunsch extended for use with clusters of aligned sequences.

 


Exercise 10 - emma and prettyplot

We have obtained a number of beta globin sequences for you and placed them all in a single text file.

Use emma to align the sequences. Change the "Output Sequence Format" to "GCG MSF".

The output file displays the best areas of similarity among the sequences. This process has aligned sequences from humans, zebra fish, cows and chickens. The sequences are very similar, but there are some differences - note the gaps that have been inserted. Also note that since this is a global alignment algorithm, gaps have been inserted to make all the sequences the same length. Differences in alignment can be very difficult to see in this format.

The program prettyplot can enhance visualization of your results, by aligning the sequences on top of one another. To use prettyplot, we need to get the sequence data from emma. To do this, there is a link in the right-pane to "outseq". Click on the link. You should see the sequences only.

Save this page (use Notepad). Now, go back, and click on prettyplot, and select the file you just saved as input, then run prettyplot.

A graphic display will appear on your screen detailing your alignment. Identical residues are shown in red, and similar residues in green. This type of display can given you a first impression region of conservation.

[ Previous | Up | Next ]

Page last modified September 29, 2008

UMDNJ logo Informatics Institute informatics institute logo informatics institute logo