IT-SC 275
pdb44pdb444d.ent pdbpdb1a4o.ent
Notice how variable names such as file
and files
have been reused in this code, using lexical scoping in the inner blocks with
my . If the overall structure of the program
wasnt so short and simple, this could get really hard to read. When the program says file
, does it mean this file
or that file
? This code is an example of how to get into trouble. It works, but its hard to read, despite its brevity.
In fact, theres a deeper problem with Example 11-2
. Its not well designed. By extending
Example 11-1 , it can now list subdirectories. But what if there are further
levels of subdirectories?
11.2.2 Recursion
If you have a subroutine that lists the contents of directories and recursively calls itself to list the contents of any subdirectories it finds, you can call it on the top-level directory,
and it eventually lists all the files.
Lets write another program that does just that. A recursive subroutine is defined simply as a subroutine that calls itself. Here is the pseudocode and the code
Example 11-3
followed by a discussion of how recursion works: subroutine list_recursively
open folder for each item in the folder
if its a file print its name
else if its a folder list_recursively
} }
Example 11-3. A recursive subroutine to list a filesystem
usrbinperl Demonstrate a recursive subroutine to list a subtree of
a filesystem use strict;
use warnings; use BeginPerlBioinfo; see Chapter 6 about this module
list_recursivelypdb;
IT-SC 276
exit; Subroutine
list_recursively list the contents of a directory,
recursively listing the contents of any subdirectories
sub list_recursively { mydirectory = _;
my files = ; Open the directory
unlessopendirDIRECTORY, directory { print Cannot open directory directory\n;
exit; }
Read the directory, ignoring special entries . and ..
files = grep \.\.?, readdirDIRECTORY; closedirDIRECTORY;
If file, print its name If directory, recursively print its contents
Notice that we need to prepend the directory name foreach my file files {
If the directory entry is a regular file if -f directoryfile {
print directoryfile\n; If the directory entry is a subdirectory
}elsif -d directoryfile {
IT-SC 277
Here is the recursive call to this subroutine list_recursivelydirectoryfile;
} }
}
Heres the output of Example 11-3
notice that its the same as the output of Example 11-2
: pdb3cpdb43c9.ent
pdb3cpdb43ca.ent pdb44pdb144d.ent
pdb44pdb144l.ent pdb44pdb244d.ent
pdb44pdb244l.ent pdb44pdb344d.ent
pdb44pdb444d.ent pdbpdb1a4o.ent
Look over the code for Example 11-3
and compare it to Example 11-2
. As you can see, the programs are largely identical.
Example 11-2 is all one main program;
Example 11-3 has almost identical code but has packaged it up as a subroutine that is
called by a short main program. The main program of Example 11-3
simply calls a recursive function, giving it a directory name for a directory that exists on my computer;
you may need to change the directory name when you attempt to run this program on your own computer. Here is the call:
list_recursivelypdb; I dont know if you feel let down, but I do. This looks just like any other subroutine call.
Clearly, the recursion must be defined within the subroutine. Its not until the very end of the list_recursively subroutine, where the program finds using the
-d file test
operator that one of the contents of the directory that its listing is itself a directory, that theres a significant difference in the code as compared with
Example 11-2 . At that
point, Example 11-2
has code to once again look for regular files or for directories. But this subroutine in
Example 11-3 simply calls a subroutine, which happens to be
itself, namely, list_recursively: list_recursivelydirectoryfile;
Thats recursion. As youve seen here, there are times when the data—for instance, the hierarchical
structure of a filesystem—is well matched by the capabilities of recursive programs. The fact that the recursive call happens at the end of the subroutine means that its a special
type of recursion called tail recursion. Although recursion can be slow, due to all the subroutine calls it can create, the good news about tail recursion is that many compilers
can optimize the code to make it run much faster. Using recursion can result in clean, short, easy-to-understand programs. Although Perl doesnt yet optimize it, current plans
for Perl 6 include support for optimizing tail recursion.
11.2.3 Processing Many Files