Opening Directories Files and Folders

IT-SC 271 sequencer produced on those days. After a few years, you could have quite a number of files. Then, one day you discover a new sequence of DNA that seems to be implicated in cell division. You do a BLAST search see Chapter 12 but find no significant hits for your new DNA. At that point you want to know whether youve seen this DNA before in any previous sequencing runs. [1] What you need to do is run a comparison subroutine on each of the hundreds or thousands of files in all your various sequencing run subdirectories. But thats going to take several days of repetitive, boring work sitting at the computer screen. [1] You may do a comparison by keeping copies of all your sequencing runs in one large BLAST library; building such a BLAST library can be done using the techniques shown in this section. You can write a program in much less time than that Then all you have to do is sit back and examine the results of any significant matches your program finds. To write the program, however, you have to know how to manipulate all the files and folders in Perl. The following sections show you how to do it.

11.2.1 Opening Directories

A filesystem is organized in a tree structure. The metaphor is apt. Starting from anyplace on the tree, you can proceed up the branches and get to any leaves that stem from your starting place. If you start from the root of the tree, you can reach all the leaves. Similarly, in a filesystem, if you start at a certain directory, you can reach all the files in all the subdirectories that stem from your starting place, and if you start at the root which, strangely enough, is also called the top of the filesystem, you can reach all the files. Youve already had plenty of practice opening, reading from, writing to, and closing files. I will show a simple method with which you can open a folder also called a directory and get the filenames of all the files in that folder. Following that, youll see how to get the names of all files from all directories and subdirectories from a certain starting point. Lets look at the Perlish way to list all the files in a folder, beginning with some pseudocode: open folder read contents of folder files and subfolders print their names Example 11-1 shows the actual Perl code. Example 11-1. Listing the contents of a folder or directory usrbinperl Demonstrating how to open a folder and list its contents IT-SC 272 use strict; use warnings; use BeginPerlBioinfo; see Chapter 6 about this module my files = ; my folder = pdb; open the folder unlessopendirFOLDER, folder { print Cannot open folder folder\n; exit; } read the contents of the folder i.e. the files and subfolders files = readdirFOLDER; close the folder closedirFOLDER; print them out, one per line print join \n, files, \n; exit; Since youre running this program on a folder that contains PDB files, this is what youll see: . .. 3c 44 pdb1a4o.ent If you want to list the files in the current directory, you can give the directory name the special name . for the current directory, like so: my folder = .; On Unix or Linux systems, the special files . and .. refer to the current directory and the parent directory, respectively. These arent really files, at least not files youd want to read; you can avoid listing them with the wonderful and amazing grep function. grep allows you to select elements from an array based on a test, such as a regular expression. Heres how to filter out the array entries . and ..: files = grep \.\.?, files; grep selects all lines that dont match the regular expression, due to the negation operator written as the exclamation mark. The regular expression \.\.? is looking IT-SC 273 for a line that begins with the beginning of a line is indicated with the metacharacter a period \. escaped with a backslash since a period is a metacharacter followed by 0 or 1 periods \.? the ? matches 0 or 1 of the preceding items, and nothing more indicated by the end-of-string metacharacter. In fact, this is so often used when reading a directory that its usually combined into one step: files = grep \.\.?, readdirFOLDER; Okay, now all the files are listed. But wait: what if some of these files arent files at all but are subfolders? You can use the handy file test operators to test each filename and then even open each subfolder and list the files in them. First, some pseudocode: open folder for each item in the folder if its a file print its name else if its a folder open the folder print the names of the contents of the folder } } Example 11-2 shows the program. Example 11-2. List contents of a folder and its subfolders usrbinperl Demonstrating how to open a folder and list its contents --distinguishing between files and subfolders, which are themselves listed use strict; use warnings; use BeginPerlBioinfo; see Chapter 6 about this module my files = ; my folder = pdb; Open the folder unlessopendirFOLDER, folder { print Cannot open folder folder\n; exit; IT-SC 274 } Read the folder, ignoring special entries . and .. files = grep \.\.?, readdirFOLDER; closedirFOLDER; If file, print its name If folder, print its name and contents Notice that we need to prepend the folder name foreach my file files { If the folder entry is a regular file if -f folderfile { print folderfile\n; If the folder entry is a subfolder }elsif -d folderfile { my folder = folderfile; open the subfolder and list its contents unlessopendirFOLDER, folder { print Cannot open folder folder\n; exit; } my files = grep \.\.?, readdirFOLDER; closedirFOLDER; foreach my file files { print folderfile\n; } } } exit; Heres the output of Example 11-2 : pdb3cpdb43c9.ent pdb3cpdb43ca.ent pdb44pdb144d.ent pdb44pdb144l.ent pdb44pdb244d.ent pdb44pdb244l.ent pdb44pdb344d.ent IT-SC 275 pdb44pdb444d.ent pdbpdb1a4o.ent Notice how variable names such as file and files have been reused in this code, using lexical scoping in the inner blocks with my . If the overall structure of the program wasnt so short and simple, this could get really hard to read. When the program says file , does it mean this file or that file ? This code is an example of how to get into trouble. It works, but its hard to read, despite its brevity. In fact, theres a deeper problem with Example 11-2 . Its not well designed. By extending Example 11-1 , it can now list subdirectories. But what if there are further levels of subdirectories?

11.2.2 Recursion