Recursion Files and Folders

IT-SC 275 pdb44pdb444d.ent pdbpdb1a4o.ent Notice how variable names such as file and files have been reused in this code, using lexical scoping in the inner blocks with my . If the overall structure of the program wasnt so short and simple, this could get really hard to read. When the program says file , does it mean this file or that file ? This code is an example of how to get into trouble. It works, but its hard to read, despite its brevity. In fact, theres a deeper problem with Example 11-2 . Its not well designed. By extending Example 11-1 , it can now list subdirectories. But what if there are further levels of subdirectories?

11.2.2 Recursion

If you have a subroutine that lists the contents of directories and recursively calls itself to list the contents of any subdirectories it finds, you can call it on the top-level directory, and it eventually lists all the files. Lets write another program that does just that. A recursive subroutine is defined simply as a subroutine that calls itself. Here is the pseudocode and the code Example 11-3 followed by a discussion of how recursion works: subroutine list_recursively open folder for each item in the folder if its a file print its name else if its a folder list_recursively } } Example 11-3. A recursive subroutine to list a filesystem usrbinperl Demonstrate a recursive subroutine to list a subtree of a filesystem use strict; use warnings; use BeginPerlBioinfo; see Chapter 6 about this module list_recursivelypdb; IT-SC 276 exit; Subroutine list_recursively list the contents of a directory, recursively listing the contents of any subdirectories sub list_recursively { mydirectory = _; my files = ; Open the directory unlessopendirDIRECTORY, directory { print Cannot open directory directory\n; exit; } Read the directory, ignoring special entries . and .. files = grep \.\.?, readdirDIRECTORY; closedirDIRECTORY; If file, print its name If directory, recursively print its contents Notice that we need to prepend the directory name foreach my file files { If the directory entry is a regular file if -f directoryfile { print directoryfile\n; If the directory entry is a subdirectory }elsif -d directoryfile { IT-SC 277 Here is the recursive call to this subroutine list_recursivelydirectoryfile; } } } Heres the output of Example 11-3 notice that its the same as the output of Example 11-2 : pdb3cpdb43c9.ent pdb3cpdb43ca.ent pdb44pdb144d.ent pdb44pdb144l.ent pdb44pdb244d.ent pdb44pdb244l.ent pdb44pdb344d.ent pdb44pdb444d.ent pdbpdb1a4o.ent Look over the code for Example 11-3 and compare it to Example 11-2 . As you can see, the programs are largely identical. Example 11-2 is all one main program; Example 11-3 has almost identical code but has packaged it up as a subroutine that is called by a short main program. The main program of Example 11-3 simply calls a recursive function, giving it a directory name for a directory that exists on my computer; you may need to change the directory name when you attempt to run this program on your own computer. Here is the call: list_recursivelypdb; I dont know if you feel let down, but I do. This looks just like any other subroutine call. Clearly, the recursion must be defined within the subroutine. Its not until the very end of the list_recursively subroutine, where the program finds using the -d file test operator that one of the contents of the directory that its listing is itself a directory, that theres a significant difference in the code as compared with Example 11-2 . At that point, Example 11-2 has code to once again look for regular files or for directories. But this subroutine in Example 11-3 simply calls a subroutine, which happens to be itself, namely, list_recursively: list_recursivelydirectoryfile; Thats recursion. As youve seen here, there are times when the data—for instance, the hierarchical structure of a filesystem—is well matched by the capabilities of recursive programs. The fact that the recursive call happens at the end of the subroutine means that its a special type of recursion called tail recursion. Although recursion can be slow, due to all the subroutine calls it can create, the good news about tail recursion is that many compilers can optimize the code to make it run much faster. Using recursion can result in clean, short, easy-to-understand programs. Although Perl doesnt yet optimize it, current plans for Perl 6 include support for optimizing tail recursion.

11.2.3 Processing Many Files