IT-SC 132
substr outside of string at example6-4 line 26. Use of uninitialized value in regexp compilation at
example6-4 line 45. Use of uninitialized value in print at example6-4 line 46.
GACGTCTTCTAAGGCGA
So, the first bug is fixed. The second bug remains with a few warnings that are, perhaps, hard to understand. But focus on the first error message, and see that it complains about
line 26:
my base2 = substrsubsequence, 1, 1; So, theres something wrong with
subsequence . Often, error messages will be off by
one line, so it may well be that the error starts on the line before, the first time subsequence
is operated on by the substr. But thats not the case here. Nonetheless, the warnings have pointed directly to the problem. In this case, you still
have to take a little initiative; look back at the subsequence
variable and notice the extra
my declaration within the
if block on line 20 that is preventing the variable from
being initialized properly. Now this is not necessarily always a bug—declaring a variable scoped within a block and that overrides another variable of the same name that is outside
the block. In fact, its perfectly legal, so the programmers who wrote the warnings did not flag it as an obvious error. However, it seems to have caused a real problem here
One final point: if you go back to the original, buggy program, notice theres no
use strict;
in the program. If you add that and run the program without arguments, you get the following:
perl example6-4 Global symbol recievingcommitment requires explicit
package name at example6-4 line 47. Execution of example6-4 aborted due to compilation errors.
Fixing the misspelled variable, and running the program with the argument, you get: perl example6-4 AA
GACGTCTTCTAAGGCGA You can see that
use strict; didnt help for the other bug. Remember, its best to
employ both use
strict; and
use warnings;
.
6.7 Exercises
Exercise 6.1 Write a subroutine to concatenate two strings of DNA.
Exercise 6.2
IT-SC 133
Write a subroutine to report the percentage of each nucleotide in DNA. Youve seen the plus operator
+ . You will also want to use the divide operator
and the multiply operator
. Count the number of each nucleotide, divide by the total length of the DNA, then multiply by 100 to get the percentage. Your arguments
should be the DNA and the nucleotide you want to report on. The int
function can be used to discard digits after the decimal point, if needed.
Exercise 6.3 Write a subroutine to prompt a user with any message, and collect the users
answer. The subroutines argument should be the message, and the return value should be the one-line answer.
Exercise 6.4 Write a subroutine to look for command-line arguments such as
-help ,
-h , and
--help . Recall that command-line arguments appear in the
ARGV array. Call
your subroutine from a main program. If you give the program any of the named command-line arguments, when you pass them into the subroutine it should return
a true value. If this is the case, have the program print out a help message in a
USAGE variable and exit.
Exercise 6.5 Write a subroutine to check if a file exists, is a regular file, and is nonzero in size.
Use the file test
operators See Appendix B
. Exercise 6.6
Use Exercise 6.3 in a subroutine that keeps prompting until a valid file is entered by the user or until five attempts have failed.
Exercise 6.7 Write a module that contains subroutines that report various statistics on DNA
sequences, for instance length, GC content, presence or absence of poly-T sequences long stretches of mostly Ts at the 5 left end of many
DNA sequences, or other measures of interest.
Exercise 6.8 Write a subroutine to do something a biologist normally does. Heres an
opportunity to look around the lab and write a useful program Exercise 6.9
Read the documentation about the debugger and become familiar with its use by applying it during your programming.
Exercise 6.10 Write a subroutine that alters an array of lines in a file. Use pass by reference for
the array. Pass the subroutine a reference to the array, a regular expression, and a string to replace the regular expression. All the lines of the array should be altered
IT-SC 134
by substituting the matches found for the regular expression by the replacement string.
IT-SC 135
Chapter 7. Mutations and Randomization
As every biologist knows, mutation is a fundamental topic in biology. Mutations in DNA occur all the time in cells. Most of them dont affect the actions of proteins and are benign.
Some of them do affect the proteins and may result in diseases such as cancer. Mutations can also lead to nonviable offspring that dies during development; occasionally they can
lead to evolutionary change. Many cells have very complex mechanisms to repair mutations.
Mutations in DNA can arise from radiation, chemical agents, replication errors, and other causes. Were going to model mutations as random events, using Perls random number
generator.
Randomization is a computer technique that crops up regularly in everyday programs, most commonly in cryptography, such as when you want to generate a hard-to-guess
password. But its also an important branch of algorithms: many of the fastest algorithms employ randomization.
Using randomization, its possible to simulate and investigate the mechanisms of mutations in DNA and their effect upon the biological activity of their associated proteins.
Simulation is a powerful tool for studying systems and predicting what they will do; randomization allows you to better simulate the ordered chaos of a biological system.
The ability to simulate mutations with computer programs can aid in the study of evolution, disease, and basic cellular processes such as division and DNA repair
mechanisms. Computer models of cell development and function, now in their early stages, will become much more accurate and useful in coming years, and mutation is a
basic biological mechanism these models will incorporate.
From the standpoint of programming technique, as well as from the standpoint of modeling evolution, mutation, and disease, randomization is a powerful—and, luckily for
us, easy-to-use—programming skill.
Heres a breakdown of what we will accomplish in this chapter: Randomly select an index into an array and a position in a string: these are the basic tools
for picking random locations in DNA or other data Model mutation with random numbers by learning how to randomly select a nucleotide in
DNA and then mutate it to some other random nucleotide Use random numbers to generate DNA sequence data sets, which can be used to study the
extent of randomness in actual genomes Repeatedly mutate DNA to study the effect of mutations accumulating over time during
evolution