IT-SC 218
verbose and print helpful_but_verbose_message; Of course, the if statement is more flexible, because it allows you to easily add more
statements to the block, and elsif and else conditions to their own blocks. But for simple situations, the
and operator works well.
[1] [1]
You can even chain logical operators one after the other to build up more complicated expressions and use parentheses to group them. Personally, I dont like that style much, but
in Perl, theres more than one way to do it
The logical operator or evaluates and returns the left argument if its true
; if the left argument doesnt evaluate to
true , the or operator then evaluates and returns the right
argument. So heres another way to write a one-line statement that youll often see in Perl programs:
openMYFILE, file or die I cannot open file file: ; This is basically equivalent to our frequent:
unlessopenMYFILE, file { print I cannot open file file\n;
exit; }
Lets go back and take a look at the parseREBASE subroutine with the line: 1 .. Rich Roberts and next;
The left argument is the range 1 .. Rich Roberts
. When youre in that range of lines, the range operator returns a
true value. Because its
true , the
and boolean
operator goes on to see if the value on the other side is true
and finds the next function, which evaluates to
true , even as it takes you back to the next iteration of the
enclosing foreach loop. So if youre between the first line and the Rich
Roberts line,
you skip the rest of the loop. Similarly, the line:
\s and next; takes you back to the next iteration of the
foreach if the left argument, which matches
a blank line, is true
. The other parts of this parseREBASE subroutine have already been discussed, during the
design phase.
9.2.5 Finding the Restriction Sites
So now its time to write a main program and see our code in action. Lets start with a little pseudocode to see what still needs to be done:
IT-SC 219
Get DNA get_file_data
extract_sequence_from_fasta_data Get the REBASE data into a hash, from file bionet
parseREBASEbionet; for each user query
If query is defined in the hash Get positions of query in DNA
Report on positions, if any }
You now need to write a subroutine that finds the positions of the query in the DNA. Remember that trick of putting a global search in a
while loop from
Example 5-7 and take heart. No sooner said than:
Given arguments query and dna while dna =~ queryig {
save the position of the match }
return positions
When you used this trick before, you just counted how many matches there were, not what the positions were. Lets check the documentation for clues, specifically the list of
built-in functions in the documentation. It looks like the pos
function will solve the problem. It gives the location of the last match of a variable in an
mg search.
Example 9-3 shows the main program followed by the required subroutine. Its a
simple subroutine, given the Perl functions like pos that make it easy.
Example 9-3. Make restriction map from user queries
usrbinperl Make restriction map from user queries on names of
restriction enzymes use strict;
use warnings; use BeginPerlBioinfo; see Chapter 6 about this module
Declare and initialize variables
IT-SC 220
my rebase_hash = ; my file_data = ;
my query = ; my dna = ;
my recognition_site = ; my regexp = ;
my locations = ; Read in the file sample.dna
file_data = get_file_datasample.dna; Extract the DNA sequence data from the contents of the
file sample.dna dna = extract_sequence_from_fasta_datafile_data;
Get the REBASE data into a hash, from file bionet rebase_hash = parseREBASEbionet;
Prompt user for restriction enzyme names, create restriction map
do { print Search for what restriction site for or quit?:
; query = STDIN;
chomp query; Exit if empty query
if query =~ \s { exit;
} Perform the search in the DNA sequence
if exists rebase_hash{query} { recognition_site, regexp = split ,
rebase_hash{query}; Create the restriction map
locations = match_positionsregexp, dna; Report the restriction map to the user
if locations { print Searching for query recognition_site
regexp\n;
IT-SC 221
print A restriction site for query at locations:\n;
print join , locations, \n; } else {
print A restriction site for query is not in the DNA:\n;
} }
print \n; } until query =~ quit ;
exit; Subroutine
Find locations of a match of a regular expression in a string
return an array of positions where the regular expression appears in the string
sub match_positions { myregexp, sequence = _;
use strict; use BeginPerlBioinfo; see Chapter 6 about this
module Declare variables
my positions = ; Determine positions of regular expression matches
while sequence =~ regexpig {
IT-SC 222
push positions, possequence - length + 1; }
return positions; }
Here is some sample output from Example 9-3
: Search for what restriction enzyme or quit?: AceI
Searching for AceI GCWGC GC[AT]GC A restriction site for AceI at locations:
54 94 582 660 696 702 840 855 957 Search for what restriction enzyme or quit?: AccII
Searching for AccII CGCG CGCG A restriction site for AccII at locations:
181 Search for what restriction enzyme or quit?: AaeI
A restriction site for AaeI is not in the DNA: Search for what restriction enzyme or quit?: quit
Notice the length
in the subroutine match_positions. That is a special
variable thats set after a successful regular-expression match. It stands for the sequence that matched the regular expression. Since pos gives the position of the first base
following the match, you have to subtract the length of the matching sequences, plus one to make the bases start at position 1 instead of position 0 to report the starting
position of the match. Other special variables include
` which contains everything in
the string before the successful match; and ´,
which contains everything in the string after the successful match. So, for example:
123456 =~ 34
succeeds at setting these special variables like so:
`= 12
, =
34 , and
´ =
56 .
What we have here is admittedly bare bones, but it does work. See the exercises at the end of the chapter for ways to extend this code.
9.3 Perl Operations