BioPerl is a toolkit of perl modules useful in building bioinformatics solutions in Perl. It is built in an object-oriented manner so that many modules depend on each other to achieve a task. The collection of modules in the bioperl-live repository consist of the core of the functionality of bioperl. Additionally auxiliary modules for creating graphical interfaces (bioperl-gui), persistent storage in RDMBS (bioperl-db), running and parsing the results from hundreds of bioinformatics applications (Run package), software to automate bioinformatic analyses (bioperl-pipeline) are all available as CVS modules in our repository.
Basically - its an open-source, object-oriented set of Perl modules. Each module performs a certain task. This design makes it easier for you to accomplish tasks by writing scripts using pre-written perl modules. Some examples include
http://www.bioperl.org/wiki/Getting_Started
User Documentation: http://www.bioperl.org/Core/Latest/modules.html
At the shell:
Mailing lists: bioperl-announce used for announcements. bioperl-l is for help and discussion. If you are using bioperl and need help, you should subscribe to bioperl-l.
We already have it installed, but if you want to install it yourself, go to http://www.bioperl.org/wiki/Getting_BioPerl.
data input/output (*IO modules)
– SeqIO SeqIO: FASTA, : GenBank enBank, EMBL, …
– AlignIO AlignIO: ClustalW lustalW, MSF, Phylip hylip, …
– TreeIO TreeIO: Newick ewick, Nexus, NHX, …
– SearchIO SearchIO: BLAST, FASTA, HMMER, …
– MapIO MapIO: MapMaker
– Matrix::IO: Scoring, Phylip hylip
– Assembly::IO: Ace, Phrap hrap
– Ontology::IO: InterPro nterPro, GO, SO
– and others …
Format conversion:
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
my $in = Bio::SeqIO->new(-format => “genbank”, -file => “myseq.gbk”);
my $out = Bio::SeqIO->new(-format => “fasta”, -file => “>myseq.fa”);
while (my $seq = $in->next_seq()) {
$out->write_seq($seq);
}
*IO modules read formatted data, generating in-memory representations (i.e. objects):
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
my $seqio = Bio::SeqIO->new(-format => “fasta”, -file => “myseq.fa”);
while (my $seq = $seqio->next_seq()) {
printf(“Sequence %s has length: %d\n”, $seq->display_id, $seq->length);
}
when reading from GenBank/EMBL records,
additional data are available:
my $annotation = seq->annotation();
my @keys = $annotation->get_all_annotation_keys();
for my $ref ($annotation->get_Annotations("reference")) {
# a Bio::Annotation::Reference object
printf("title: %s\nauthors: %s\njournal: %s\n\n", $ref->title(), $ref->authors(), $ref->location());
}
#!/usr/bin/perl
use Bio::Seq;
# create a sequence object of some DNA
my $seq_obj = Bio::Seq->new(-id => 'testseq', -seq => 'CATGTAGATAG', -alphabet => 'dna', -desc => 'example 1');
# print out some details about it
print $seq_obj->seq, "\n";
# To modify the sequence, you can do
#$seq_obj->seq("AAAACCCCCGGGGTTTTT");
print "seq is ", $seq_obj->length, " bases long\n";
print "revcom seq is ", $seq_obj->revcom->seq, "\n";
#!/usr/bin/perl use Bio::Seq; use Bio::SeqIO; # create a sequence object of some DNA my $seq_obj = Bio::Seq->new(-id => 'testseq', -seq => 'CATGTAGATAG', -alphabet => 'dna', -desc => 'example 1'); # print out some details about it print $seq_obj->seq, "\n"; print "seq is ", $seq_obj->length, " bases long\n"; print "revcom seq is ", $seq_obj->revcom->seq, "\n"; # write it to a file in Fasta format my $out = Bio::SeqIO->new(-file => '>testseq.fsa', -format => 'Fasta'); $out->write_seq($seq_obj);
#!/bin/perl
use Bio::SeqIO;
$seqio_obj = Bio::SeqIO->new(-file => "sequence.fasta", -format => "fasta" );
$seq_obj = $seqio_obj->next_seq;
# if there are multiple sequences in the file, you can use:
while ($seq_obj = $seqio_obj->next_seq) {
# print the sequence
print $seq_obj->seq,"\n";
}
#!/usr/bin/perl
use Bio::DB::GenBank; # SwissProt, GenPept, EMBL, EntrezGene, RefSeq, etc
$db_obj = Bio::DB::GenBank->new;
$seq_obj = get_Seq_by_id(115387102); # you can use get_Seq_by_acc('A12345') or get_Seq_by_version('A12345.2')
#!/usr/bin/perl
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
$query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; #SLEN limits the size of the sequence
$query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query );
$gb_obj = Bio::DB::GenBank->new;
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
while ($seq_obj = $stream_obj->next_seq) {
# do something with the sequence object
print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}
*** Look at NCBI's website for code to do this: BLAST Page -> Help -> Developer Information -> Web service interface
Run Blast:
#!/usr/bin/perl use strict; use warnings; use diagnostics; use Bio::SearchIO; `/usr/local/ncbi/blast -p blastn -d /usr/local/ncbi/db/nt -m 8 -i <file> > /tmp/blast.out` #my $searchio = Bio::SearchIO->new(-format => 'blasttable', -file => /tmp/blast.out); my $searchio = Bio::SearchIO->new(-format => 'blast', -file => /tmp/blast.out); my $result = $searchio->next_result;
while (my $hit = $result->next_hit) {
my $hsp = $hit->next_hsp;
if (defined($hsp)) {
print $hit->name."\t".$hsp->percent_identity."\t".$hit->significance."\t".$hsp->hsp_length."\t".$hsp->start('query')."\t".$hsp->end('query')."\t".$hsp->strand('hit')."\t".$hsp->start('hit')."\t".$hsp->end('hit')."\n";
}
}
# How would you get information about the next hsp?
Go through the Beginners HOWTO and BPTutorial