BioPerl

http://www.bioperl.org

BioPerl is a toolkit of perl modules useful in building bioinformatics solutions in Perl. It is built in an object-oriented manner so that many modules depend on each other to achieve a task. The collection of modules in the bioperl-live repository consist of the core of the functionality of bioperl. Additionally auxiliary modules for creating graphical interfaces (bioperl-gui), persistent storage in RDMBS (bioperl-db), running and parsing the results from hundreds of bioinformatics applications (Run package), software to automate bioinformatic analyses (bioperl-pipeline) are all available as CVS modules in our repository.

Basically - its an open-source, object-oriented set of Perl modules. Each module performs a certain task. This design makes it easier for you to accomplish tasks by writing scripts using pre-written perl modules. Some examples include

 

How to get started?

http://www.bioperl.org/wiki/Getting_Started

User Documentation: http://www.bioperl.org/Core/Latest/modules.html

At the shell:

Mailing lists: bioperl-announce used for announcements. bioperl-l is for help and discussion. If you are using bioperl and need help, you should subscribe to bioperl-l.

We already have it installed, but if you want to install it yourself, go to http://www.bioperl.org/wiki/Getting_BioPerl.

I/O

data input/output (*IO modules)

– SeqIO SeqIO: FASTA, : GenBank enBank, EMBL, …
– AlignIO AlignIO: ClustalW lustalW, MSF, Phylip hylip, …
– TreeIO TreeIO: Newick ewick, Nexus, NHX, …
– SearchIO SearchIO: BLAST, FASTA, HMMER, …
– MapIO MapIO: MapMaker
– Matrix::IO: Scoring, Phylip hylip
– Assembly::IO: Ace, Phrap hrap
– Ontology::IO: InterPro nterPro, GO, SO
– and others …

Format conversion:

#!/usr/bin/perl 


use strict;
use Bio::SeqIO;


my $in = Bio::SeqIO->new(-format => “genbank”, -file => “myseq.gbk”);
my $out = Bio::SeqIO->new(-format => “fasta”, -file => “>myseq.fa”);
while (my $seq = $in->next_seq()) {
   $out->write_seq($seq);
} 

*IO modules read formatted data, generating in-memory representations (i.e. objects):

#!/usr/bin/perl 


use strict;
use Bio::SeqIO;


my $seqio = Bio::SeqIO->new(-format => “fasta”, -file => “myseq.fa”);
while (my $seq = $seqio->next_seq()) {
   printf(“Sequence %s has length: %d\n”, $seq->display_id,  $seq->length);
}

Bio::Seq methods

when reading from GenBank/EMBL records,
additional data are available:

Bio::Annotation objects

my $annotation = seq->annotation();
my @keys = $annotation->get_all_annotation_keys();
for my $ref ($annotation->get_Annotations("reference")) {
     # a Bio::Annotation::Reference object
     printf("title: %s\nauthors: %s\njournal: %s\n\n", $ref->title(), $ref->authors(), $ref->location());
}

Bio::SeqFeature objects

Example 1 (Creating a Sequence)

#!/usr/bin/perl


use Bio::Seq;  



# create a sequence object of some DNA  
my $seq_obj = Bio::Seq->new(-id => 'testseq', -seq => 'CATGTAGATAG', -alphabet => 'dna', -desc => 'example 1');  

# print out some details about it
print $seq_obj->seq, "\n";

# To modify the sequence, you can do
#$seq_obj->seq("AAAACCCCCGGGGTTTTT");

print "seq is ", $seq_obj->length, " bases long\n";  
print "revcom seq is ", $seq_obj->revcom->seq, "\n";

Example 2 (Writing a Sequence to a file)

#!/usr/bin/perl


use Bio::Seq;  
use Bio::SeqIO;


# create a sequence object of some DNA  
my $seq_obj = Bio::Seq->new(-id => 'testseq', -seq => 'CATGTAGATAG', -alphabet => 'dna', -desc => 'example 1');  

# print out some details about it
print $seq_obj->seq, "\n";
print "seq is ", $seq_obj->length, " bases long\n";  
print "revcom seq is ", $seq_obj->revcom->seq, "\n";

# write it to a file in Fasta format   
my $out = Bio::SeqIO->new(-file => '>testseq.fsa', -format => 'Fasta');  
$out->write_seq($seq_obj);

Example 3 (Retrieving a sequence from a file)

#!/bin/perl     


use Bio::SeqIO;


$seqio_obj = Bio::SeqIO->new(-file => "sequence.fasta", -format => "fasta" );
$seq_obj = $seqio_obj->next_seq;

# if there are multiple sequences in the file, you can use:
while ($seq_obj = $seqio_obj->next_seq) {
     # print the sequence
     print $seq_obj->seq,"\n";
}

Problem 1: Write a program to read in a genbank file and convert it to a fasta file

Example 4 (Retrieving a sequence from a database)

#!/usr/bin/perl


use Bio::DB::GenBank; # SwissProt, GenPept, EMBL, EntrezGene, RefSeq, etc
$db_obj = Bio::DB::GenBank->new;
$seq_obj = get_Seq_by_id(115387102); # you can use get_Seq_by_acc('A12345') or get_Seq_by_version('A12345.2')

Example 5 (Retrieve multiple sequences from a database)

#!/usr/bin/perl


use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
$query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; #SLEN limits the size of the sequence
$query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query );
$gb_obj = Bio::DB::GenBank->new;
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
while ($seq_obj = $stream_obj->next_seq) {
     # do something with the sequence object
     print $seq_obj->display_id, "\t", $seq_obj->length, "\n";  
}

Example 6 (Remote BLAST)

*** Look at NCBI's website for code to do this: BLAST Page -> Help -> Developer Information -> Web service interface

Example 7 (Local BLAST)

Run Blast:

#!/usr/bin/perl 


use strict; 
use warnings; 
use diagnostics;  
use Bio::SearchIO;  


`/usr/local/ncbi/blast -p blastn -d /usr/local/ncbi/db/nt -m 8 -i <file> > /tmp/blast.out` 

#my $searchio = Bio::SearchIO->new(-format => 'blasttable', -file => /tmp/blast.out);
my $searchio = Bio::SearchIO->new(-format => 'blast', -file => /tmp/blast.out);
my $result = $searchio->next_result;
while (my $hit = $result->next_hit) {
     my $hsp = $hit->next_hsp;
     if (defined($hsp)) {
          print $hit->name."\t".$hsp->percent_identity."\t".$hit->significance."\t".$hsp->hsp_length."\t".$hsp->start('query')."\t".$hsp->end('query')."\t".$hsp->strand('hit')."\t".$hsp->start('hit')."\t".$hsp->end('hit')."\n";
     }
}
   

# How would you get information about the next hsp?

On your own

Go through the Beginners HOWTO and BPTutorial