The Stride Secondary Structure Predictor

IT-SC 304 program or data online. Using the appropriate Perl modules, you can connect to the web site, send it your input, collect the output, and then parse and reformat as you wish. Its actually not hard to do OReillys Perl Cookbook, a companion volume to Programming Perl, is an excellent source of short programs and helpful descriptions to get you started. Perl is a great way to automate other programs. The next section shows an example of a Perl program that starts another program and collects, parses, reformats, and outputs the results. This program will control another program on the same computer. The example will be from a Unix or Linux environment; consult your Perl documentation on how to get the same functionality from your Windows or Macintosh platform.

11.5.1 The Stride Secondary Structure Predictor

We will use an external program to calculate the secondary structure from the 3D coordinates of a PDB file. As a secondary structure assignment engine, I use a program that outputs a secondary structure report, called stride. stride is available from EMBL http:www.embl-heidelberg.destridestride_info.html and runs on Unix, Linux, Windows, Macintosh, and VMS systems. The program works very simply; just give it a command-line argument of a PDB filename and collect the output in the subroutine call_stride that follows. Example 11-7 is the entire program: two subroutines and a main program, followed by a discussion. Example 11-7. Call another program for secondary structure prediction usrbinperl Call another program to perform secondary structure prediction use strict; use warnings; Call stride on a file, collect the report mystride_output = call_stridepdbc1pdb1c1f.ent; Parse the stride report into primary sequence, and secondary structure prediction mysequence, structure = parse_stridestride_output; Print out the beginnings of the sequence and the secondary structure print substrsequence, 0, 80, \n; print substrstructure, 0, 80, \n; exit; IT-SC 305 Subroutine for Example 11-7 call_stride --given a PDB filename, return the output from the stride secondary structure prediction program sub call_stride { use strict; use warnings; myfilename = _; The stride program options mystride = usrlocalbinstride; myoptions = ; myresults = ; Check for presence of PDB file unless -e filename { print File \filename\ doesn\t seem to exist\n; exit; } Start up the program, capture and return the output results = `stride options filename`; return results; } parse_stride --given stride output, extract the primary sequence and the secondary structure prediction, returning them in a two-element array. sub parse_stride { IT-SC 306 use strict; use warnings; mystridereport = _; myseq = ; mystr = ; my length; Extract the lines of interest myseq = grepSEQ , stridereport; mystr = grepSTR , stridereport; Process those lines to discard all but the sequence or structure information for seq { _ = substr_, 10, 50 } for str { _ = substr_, 10, 50 } Return the information as an array of two strings seq = join, seq; str = join, str; Delete unwanted spaces from the ends of the strings. seq has no spaces that are wanted, but str may seq =~ s\s+; length = length1; str =~ s\s{length}; return seq, str ; } As you can see in the subroutine call_stride, variables have been made for the program name stride and for the options you may want to pass options . Since these are parts of the program you may want to change, put them as variables near the top of the code, to make them easy to find and alter. The argument to the subroutine is the PDB filename filename . Of course, if you expect the options to change frequently, you can make them another argument to the subroutine. Since youre dealing with a program that takes a file, do a little error checking to see if a file by that name actually exists. Use the -e file test operator. Or you can omit this and let the stride program figure it out, and capture its error output. But that requires parsing the stride output for its error output, which involves figuring out how stride reports errors. This can get complicated, so Id stick with using the -e file test operator. The actual running of the program and collecting its output happens in just one line. The IT-SC 307 program to be run is enclosed in backticks, which run the program first expanding variables and return the output as an array of lines. There are other ways to run a program. One common way is the system function call. It behaves differently from the backticks: it doesnt return the output of the command it calls it just returns the exit status, an integer indicating success or failure of the command. Other methods include qx , the open system call, and the fork and exec functions.

11.5.2 Parsing Stride Output