Main Program Parsing Annotations

IT-SC 246 This example shows how to use subroutines to answer a question such as: what are the genes on chromosome 22 that contain a given motif and have small exons?

10.4.3 Main Program

Lets test these subroutines with Example 10-5 , which has some subroutine definitions that will be added to the BeginPerlBioinfo.pm module: Example 10-5. GenBank library subroutines usrbinperl - test program of GenBank library subroutines use strict; use warnings; Dont use BeginPerlBioinfo Since all subroutines defined in this file use BeginPerlBioinfo; see Chapter 6 about this module Declare and initialize variables my fh; variable to store filehandle my record; my dna; my annotation; my offset; my library = library.gb; Perform some standard subroutines for test fh = open_filelibrary; offset = tellfh; while record = get_next_recordfh { annotation, dna = get_annotation_and_dnarecord; if search_sequencedna, AAA[CG]. { print Sequence found in record at offset offset\n; } if search_annotationannotation, homo sapiens { print Annotation found in record at offset offset\n; } offset = tellfh; IT-SC 247 } exit; Subroutines open_file - given filename, set filehandle sub open_file { myfilename = _; my fh; unlessopenfh, filename { print Cannot open file filename\n; exit; } return fh; } get_next_record - given GenBank record, get annotation and DNA sub get_next_record { myfh = _; myoffset; myrecord = ; mysave_input_separator = ; = \n; record = fh; = save_input_separator; return record; } IT-SC 248 get_annotation_and_dna - given GenBank record, get annotation and DNA sub get_annotation_and_dna { myrecord = _; myannotation = ; mydna = ; Now separate the annotation from the sequence data annotation, dna = record =~ LOCUS.ORIGIN\s\n.\\\ns; clean the sequence of any whitespace or characters the has to be written \ in the character class, because is a metacharacter, so it must be escaped with \ dna =~ s[\s\]g; returnannotation, dna } search_sequence - search sequence with regular expression sub search_sequence { mysequence, regularexpression = _; mylocations = ; while sequence =~ regularexpressionig { push locations, pos ; } return locations; } search_annotation - search annotation with regular expression sub search_annotation { IT-SC 249 myannotation, regularexpression = _; mylocations = ; note the s modifier--. matches any character including newline while annotation =~ regularexpressionisg { push locations, pos ; } return locations; } Example 10-5 generates the following output on our little GenBank library: Sequence found in record at offset 0 Annotation found in record at offset 0 Sequence found in record at offset 6256 Annotation found in record at offset 6256 Sequence found in record at offset 12366 Annotation found in record at offset 12366 Sequence found in record at offset 17730 Annotation found in record at offset 17730 Sequence found in record at offset 22340 Annotation found in record at offset 22340 The tell function reports the byte offset of the file up to the point where its been read; so you want to first call tell and then read the record to get the proper offset associated with the beginning of the record.

10.4.4 Parsing Annotations at the Top Level