IT-SC 304
program or data online. Using the appropriate Perl modules, you can connect to the web site, send it your input, collect the output, and then parse and reformat as you wish. Its
actually not hard to do OReillys Perl Cookbook, a companion volume to Programming Perl, is an excellent source of short programs and helpful descriptions
to get you started.
Perl is a great way to automate other programs. The next section shows an example of a Perl program that starts another program and collects, parses, reformats, and outputs the
results. This program will control another program on the same computer. The example will be from a Unix or Linux environment; consult your Perl documentation on how to
get the same functionality from your Windows or Macintosh platform.
11.5.1 The Stride Secondary Structure Predictor
We will use an external program to calculate the secondary structure from the 3D coordinates of a PDB file. As a secondary structure assignment engine, I use a program
that outputs a secondary structure report, called stride. stride is available from EMBL
http:www.embl-heidelberg.destridestride_info.html and runs on Unix,
Linux, Windows, Macintosh, and VMS systems. The program works very simply; just give it a command-line argument of a PDB filename and collect the output in the
subroutine call_stride that follows.
Example 11-7 is the entire program: two subroutines and a main program, followed
by a discussion.
Example 11-7. Call another program for secondary structure prediction
usrbinperl Call another program to perform secondary structure
prediction use strict;
use warnings; Call stride on a file, collect the report
mystride_output = call_stridepdbc1pdb1c1f.ent; Parse the stride report into primary sequence, and
secondary structure prediction
mysequence, structure = parse_stridestride_output; Print out the beginnings of the sequence and the
secondary structure print substrsequence, 0, 80, \n;
print substrstructure, 0, 80, \n; exit;
IT-SC 305
Subroutine for
Example 11-7 call_stride
--given a PDB filename, return the output from the stride
secondary structure prediction program sub call_stride {
use strict; use warnings;
myfilename = _; The stride program options
mystride = usrlocalbinstride; myoptions = ;
myresults = ; Check for presence of PDB file
unless -e filename { print File \filename\ doesn\t seem to
exist\n; exit;
} Start up the program, capture and return the output
results = `stride options filename`; return results;
} parse_stride
--given stride output, extract the primary sequence and the
secondary structure prediction, returning them in a two-element array.
sub parse_stride {
IT-SC 306
use strict; use warnings;
mystridereport = _; myseq = ;
mystr = ; my length;
Extract the lines of interest myseq = grepSEQ , stridereport;
mystr = grepSTR , stridereport; Process those lines to discard all but the sequence
or structure information for seq { _ = substr_, 10, 50 }
for str { _ = substr_, 10, 50 } Return the information as an array of two strings
seq = join, seq; str = join, str;
Delete unwanted spaces from the ends of the strings. seq has no spaces that are wanted, but str may
seq =~ s\s+; length = length1;
str =~ s\s{length}; return seq, str ;
}
As you can see in the subroutine call_stride, variables have been made for the program name
stride and for the options you may want to pass
options . Since these
are parts of the program you may want to change, put them as variables near the top of the code, to make them easy to find and alter. The argument to the subroutine is the PDB
filename filename
. Of course, if you expect the options to change frequently, you can make them another argument to the subroutine.
Since youre dealing with a program that takes a file, do a little error checking to see if a file by that name actually exists. Use the
-e file test operator. Or you can omit this and
let the stride program figure it out, and capture its error output. But that requires parsing the stride output for its error output, which involves figuring out how stride reports errors.
This can get complicated, so Id stick with using the -e
file test operator. The actual running of the program and collecting its output happens in just one line. The
IT-SC 307
program to be run is enclosed in backticks, which run the program first expanding variables and return the output as an array of lines.
There are other ways to run a program. One common way is the system function call. It behaves differently from the backticks: it doesnt return the output of the command it
calls it just returns the exit status, an integer indicating success or failure of the command. Other methods include qx , the open system call, and the fork and exec
functions.
11.5.2 Parsing Stride Output