Intro to BioPerl(pdf)

Intro to BioPerl

Install BioPerl http://www.bioperl.org/

#bioperl website #click on HOWTO in the Documentation box for some code to help you get started

cd ~/Downloads wget http://bioperl.org/DIST/BioPerl1.6.1.tar.gz tar xzvf BioPerl1.6.1.tar.gz cd BioPerl1.6.1 sudo ./Build.PL

#type your password when prompted

#hit enter when prompted to accept the defaults [Y/n] sudo ./Build install

object attributes

desc $seq_obj>desc

object $seq_obj

seq $seq_obj>seq

length $seq_obj>length

LOCUS NP_001123420 440 aa linear MAM 11MAR2011 DEFINITION Lgulonolactone oxidase [Sus scrofa]. ACCESSION NP_001123420 VERSION NP_001123420.1 GI:194018724 DBSOURCE REFSEQ: accession NM_001129948.1 KEYWORDS . SOURCE Sus scrofa (pig) ORGANISM Sus scrofa Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus. REFERENCE 1 (residues 1 to 440) AUTHORS Hasan,L., Vogeli,P., Stoll,P., Kramer,S.S., Stranzinger,G. and Neuenschwander,S. TITLE Intragenic deletion in the gene encoding Lgulonolactone oxidase causes vitamin C deficiency in pigs JOURNAL Mamm. Genome 15 (4), 323333 (2004) PUBMED 15112110 FEATURES Location/Qualifiers Protein 1..440 /product="Lgulonolactone oxidase" /EC_number="1.1.3.8" /note="Lgulonogammalactone oxidase; GLO; LGO" /calculated_mol_wt=50221 CDS 1..440 /gene="GULO" /coded_by="NM_001129948.1:228..1550" /db_xref="GeneID:396759" ORIGIN 1 mvhghkgvkf qnwaktygcc pemyyqptsv eeirevlala rqqnkrvkvv ggghspsdia 61 ctdgfmihmg kmnrvlkvdm ekkqvtveag illadlhpql dkhglalsnl gavsdvtagg 121 vigsgthntg ikhgilatqv veltlltpdg tvlvcsessn aevfqaarvh lgclgviltv 181 tlqcvpqfhl qettfpstlk evldnldshl kkseyfrflw fphsenvsvi yqdhtnkpps 241 ssanwfwdya igfyllefll wistfvpglv gwinrfffwl lfngkkencn lshkiftyec 301 rfkqhvqdwa iprektkeal lelkamleah pkvvahypve vrftraddil lspcfqrdsc 361 ymniimyrpy gkdvprldyw layetimkkv ggrphwakah nctrkdfekm ypafrkfcai 421 rekldptgmf lnaylekvfy //

Use BioPerl to retrieve sequences from a list of GenBank accessions

1. 2. 3. 4.

Pseudocode: Put your GenBank accessions into an array (“NP_848862”, “NP_071556”, “NP_001123420”, “NP_001029215”) Loop through the array values (@accessions) Connect to GenBank and get the accession record (sequence object) Print out the record description, sequence and length

#!/usr/bin/perl w use strict; use Bio::DB::GenBank;

#module for connecting to GenBank database

my $db_obj = Bio::DB::GenBank>new; my @accessions = (put accessions here);

#create your new database connection object #type in the above accessions in the array

foreach my $acc (@accessions) { my $seq_obj = $db_obj>get_Seq_by_acc($acc); #connect to GenBank & get accession info print $seq_obj>desc . "\n"; #print out the accession description print $seq_obj>seq . "\n"; #print out the accession sequence print $seq_obj>length . "\n"; #print out the sequence length }

Script 1

Use the Data::Dumper module to see what is in an object

#!/usr/bin/perl w use strict; use Bio::DB::GenBank; use Data::Dumper;

#module for connecting to GenBank database ####module to print variable data



foreach my $acc (@accessions) { my $seq_obj = $db_obj>get_Seq_by_acc($acc); #connect to GenBank & get accession info print Dumper($seq_obj); die; ####print object info and die print $seq_obj>desc . "\n"; #print out the accession description print $seq_obj>seq . "\n"; #print out the accession sequence print $seq_obj>length . "\n"; #print out the sequence length }

Script 1

Print out your accession records to a file in GenBank format 1. 2. 3. 4. 5.

Pseudocode: Put your GenBank accessions into an array (“NP_848862”, “NP_071556”, “NP_001123420”, “NP_001029215”) Use BioPerl to open a file for writing your records in genbank format Loop through the array values (@accessions) Connect to GenBank and get the accession record (sequence object) Write the GenBank records to the outfile

#!/usr/bin/perl w use strict; use Bio::DB::GenBank; use Bio::SeqIO;

#module for connecting to GenBank database #module for sequence Input/Output writing



my $outfile_obj = Bio::SeqIO>new( file => '>gulo.gb', format => 'genbank' );

#open file for writing data # in genbank format # fasta format is also available

foreach my $acc (@accessions) { my $seq_obj = $db_obj>get_Seq_by_acc($acc); #get accession sequence object from GenBank $outfile_obj>write_seq($seq_obj); #write the GenBank record to your file }

Script 2

Handling Errors when Record is not found by BioPerl

1. 2. 3. 4.

Pseudocode: Put your GenBank accessions into an array Loop through the array values (@accessions) Connect to GenBank and get the accession record (object) Print out sequence only if record is found

#!/usr/bin/perl w use strict; use Bio::DB::GenBank;

#module for connecting to GenBank database

my $db_obj = Bio::DB::GenBank>new; my @accessions = ("NP_ABC123", "NP_848862");

#create your new database connection object #array of accessions

foreach my $acc (@accessions) { my $seq_obj; eval { $seq_obj = $db_obj>get_Seq_by_acc($acc); #connect to and get accession info }; #a semicolon is required when using eval if ($@) { #if error was found by eval; $@ catches errors found by eval print “$acc not found.\n”; } else { print $seq_obj>seq . "\n"; #print out the accession sequence } }

Script 3

Writing your own subroutines (functions) subroutines are for creating your own functions similar to length( ) and substr( ) keeps your script concise and organized saves from repeating blocks of code across multiple places in your script; code reuse #!/usr/bin/perl w use strict; my $first_number = 5; my $second_number = 8; my $total = get_total($first_number, $second_number); print “The total of $first_number and $second_number is $total\n”; sub get_total { my ($value1, $value2) = @_; # @_ is a Perl reserved variable my $value3 = $value1 + $value2; return $value3; }

Script 4

Programming Assignment Part 1 Write a script that uses BioPerl and eval and gets sequences from NCBI and prints to a file in FASTA format. This script will be used for next week's Multiple Sequence Alignment class 1. 2. 3. 4. 5. 6.

Pseudocode: Create a file that has the GenBank protein accession from script 1 Read in (shift) the file that contains GenBank protein accessions in a single column Put accessions into an array Loop through the array values (@accessions) Connect to GenBank and get the accession record (object) Print sequence in FASTA format to a file only if record is found # you are basically adding code from script 2 to script 3 for this assignment FASTA file

accession file NP_848862 NP_071556

Perl Script

>L-gulono.. MVHGYKG VKFQNWA

Programming Assignment Part 2 Write a script to translate the mystery_ccds.fa sequence into an protein sequence in all 6 reading frames Create two subroutines one to do the actual translation part (use the substr function here) one to get the reverse complement of a sequence (use the tr and reverse functions)

Use the %translation hash provided in the file dna2rna.pl Print out results in FASTA format and include in the header the frame that was used +1, +2, +3, -1, -2, -3 Identify which frame is the proper translation Submit your BioPerl and translation scripts to me by next week wget http://140.226.65.107/mystery_ccds.fa wget http://140.226.65.107/dna2rna.pl

Perl has a function called transliterate which allows you to replace all matching characters in a string. Its basically a search and replace for characters in a string. #!/usr/bin/perl w use strict; my $sequence = “CAT”; $sequence =~ tr/GATC/CTAG/; print “$sequence\n”;

#you need the same number of characters inside each set of the forward slashes; this is global replacement by default #prints GTA

Perl has a function called reverse( ) which allows you to reverse a string #!/usr/bin/perl w use strict; my $sequence = “GTA”; my $reversed_sequence = reverse($sequence); print “$reversed_sequence\n”;

#prints ATG

Intro to BioPerl(pdf)

Intro to BioPerl(pdf)

Suggest Documents

Intro To FEM

To Download - Avatar Intro

Intro to MBTI

intro. to earth science

Intro to Aurora.indd

Intro to OOP Intro to OOP OOP Terminology - EECS Instruction

Intro to Calc

Intro to MIG Welding

Intro to Data Structures

Intro to MRI Physics

Intro to DB Management

Intro to GC Reports

Intro to Vegan

Intro to the ChefDK

Intro to Apache Spark

Intro to myvideorights - A2IM

Intro to GEOPAK

Intro to LaTeX

Intro to OLAP - Percona

Intro to Electronics

Intro to MIG Welding

Intro to biological physics

Intro to NWNZ

Intro. to Hospitality & Tourism