IT-SC 99
allows you to count non-ACGT characters without specifying them individually. In later chapters, youll use those
while loops to good effect.
However, theres an even faster way to count bases. You can use the tr transliteration function from
Chapter 4 ; its faster, which is helpful if you have a lot of DNA to count:
a = dna =~ trAa; c = dna =~ trCc;
g = dna =~ trGg; t = dna =~ trTt;
The tr function returns the count of the specified characters it finds in the string, and if the set of replacement characters is empty, it doesnt actually change the string. So it
makes a good character counter. Notice that with tr, you have to spell out the upper- and lowercase letters. Also, because tr doesnt accept character classes, theres no direct way
to count nonbases. You could, however, say:
basecount = dna = ~ trACGTacgt; nonbase = length dna - basecount
The program however, runs faster using tr than using the while loops of Example 5-7
. You may find it a bit much to have three really, four versions of this base-counting
program, especially since much of the code in each version is identical. The only part of the program that really changed was the part that did the counting of the bases. Wouldnt
it have been convenient to have a way to just alter the part that counts the bases? In
Chapter 6 , youll see how subroutines allow you to partition your programs in just such
a way.
5.8 Exercises
Exercise 5.1 Use a loop to write a nonhalting program. The conditional must always evaluate
to true
, every time through the loop. Note that some systems will catch that youre in an infinite loop and will stop the program automatically. You will stop
your program differently, depending on which operating system you use. Ctrl-C works on Unix and Linux, a Windows MS-DOS command window, or a MacOS
X shell window.
Exercise 5.2 Prompt the user to enter two short strings of DNA. Concatenate the two strings
of DNA by appending the second to the first using the .=
assignment operator. Print the two strings as concatenated, and then print the second string lined up
over its copy at the end of the concatenated strings. For example, if the input strings are
AAAA and
TTTT , print:
AAAATTTT TTTT
Exercise 5.3
IT-SC 100
Write a program that prints all the numbers from 1 to 100. Your program should have much fewer than 100 lines of code
Exercise 5.4 Write a program to calculate the reverse complement of a strand of DNA. Do not
use the s or the tr functions. Use the substr function, and examine each base one at a time in the original while you build up the reverse complement. Hint:
you might find it easier to examine the original right to left, rather than left to right, although either is possible.
Exercise 5.5 Write a program to report on the percentage of hydrophobic amino acids in a
protein sequence. To find which amino acids are hydrophobic, consult any introductory text on proteins, molecular biology, or cell biology. You will find
information sources in
Appendix A .
Exercise 5.6 Write a program that checks if two strings given as arguments are reverse
complements of each other. Use the Perl built-in functions split, pop, shift, and eq eq actually an operator.
Exercise 5.7 Write a program to report how GC-rich some sequence is. In other words, just
give the percentage of G and C in the DNA. Exercise 5.8
Modify Example 5-3
to not only find motifs by regular expressions but to print out the motif that was found. For example, if you search, using regular
expressions, for the motif EE.EE, your program should print EETVKNDEE. You can use the special variable
. After a successful pattern match, this special variable is set to hold the pattern that was matched.
Exercise 5.9 Write a program that switches two bases in a DNA string at specified positions.
Hint: you can use the Perl functions substr
or slice
. Exercise 5.10
Write a program that writes a temporary file and then deletes it. The unlink function removes a file: just say, for example:
unlink tmpfile; but also check to see if
unlink is successful.
IT-SC 101
Chapter 6. Subroutines and Bugs
In this chapter youll extend your basic knowledge in two directions: Subroutines
Using the Perl debugger Subroutines are an important way to structure programs. Youll use them in
Chapter 7 ,
where youll learn how to use randomization to simulate the mutation of DNA. The Perl debugger examines a programs behavior in slow motion and helps you find those
pesky bugs.
6.1 Subroutines
Subroutines are an important way to organize a program and are used in all major programming languages.
A subroutine wraps up a bit of code, gives the code a name, and provides a way to pass in some values for its calculations and then report back the results. The rest of the program
can then use the subroutines code just by calling its name, giving the needed values to pass in to the subroutine code and then collecting the results. This use or invocation of
a subroutine is commonly referred to as calling the subroutine. You can think of a subroutine as a program within a program; just as you run programs to get results, so
your programs call subroutines to get results. Once you have a subroutine, you can use it in a program simply by knowing which values to pass in and what kind of values to
expect it to pass out.
6.1.1 Advantages of Subroutines
Subroutines provide several benefits. They endow programs with abstraction, modularization, and the ability to create large programs by organizing the code into
manageable chunks with defined inputs and outputs.
Say you need to calculate something, for instance the mean of a distribution at several places in a program or in several different programs. By writing this calculation as a
subroutine, you can write it once, and then call it whenever you need it, thus making your program:
Shorter, since youre reusing the code. Easier to test, since you can test the subroutine separately.
Easier to understand, since it reduces clutter and better organizes programs. More reliable, since you have less code when you reuse subroutines, so there are fewer