Some Notes About the Code
7.5.1 Some Notes About the Code
Notice in the main program that when it calls: random_DNA = make_random_DNA_set 10, 10, 10 ; you dont need to declare and initialize variables such as minimum_length . You can just fill in the actual numbers when you call the subroutine. However its often a good idea to put such things in variables declared at the top of the program, where its easy to find and change them. Here, you set the maximum and minimum lengths to 10 and ask for 10 sequences. Lets restate the problem we just solved. You have to compare all pairs of DNA, and for each pair, calculate the percentage of positions that have the same nucleotides. Then, you have to take the mean of these percentages. Heres the code that accomplishes this in the main program of Example 7-4 : Iterate through all pairs of sequences for my k = 0 ; k scalar random_DNA - 1 ; ++k { for my i = k + 1 ; i scalar random_DNA ; ++i { Calculate and save the matching percentage percent = matching_percentagerandom_DNA[k], random_DNA[i]; pushpercentages, percent; } } To look at each pair, you use a nested loop. A nested loop is simply a loop within another loop. These are fairly common in programming but must be handled with care. They may seem a little complex; take some time to see how the nested loop works, because its IT-SC 167 common to have to select all combinations of two or more elements from a set. The nested loop involves looking at n n-1 2 pairs of sequences, which is a square function of the size of the data set. This can get very big Try gradually increasing the size of the data set and rerunning the program, and youll see your compute time increase, and more than gradually. See how the looping works? First sequence 0 indexed by K is paired with sequences 1,2,3,...,9, in turn indexed by i . Then sequence 1 is paired with 2,3,...,9, etc. Finally, 8 is paired with 9. Recall that array elements are numbered starting at 0, so the last element of an array with 10 elements is numbered 9. Also recall that scalar random_DNA returns the number of elements in the array. You might find it a worthwhile exercise to let the number of sequences be some small value, say 3 or 4, and think through paper and pencil in hand how the nested loops and the variables k and i evolve during the running of the program. Or you can use the Perl debugger to watch how it happens.7.6 Exercises
Parts
» OReilly.Beginning.Perl For Bioinformatics
» The Organization of Proteins
» In Silico Biology and Computer Science
» A Low and Long Learning Curve
» Ease of Programming Rapid Prototyping
» Portability, Speed, and Program Maintenance
» Perl May Already Be Installed No Internet Access?
» Downloading Binary Versus Source Code
» Unix and Linux Macintosh Windows
» Unix or Linux How to Run Perl Programs
» Text Editors Getting Started with Perl
» Finding Help Getting Started with Perl
» Saves and Backups Error Messages
» Individual Approaches to Programming Programming Strategies
» The Design Phase The Programming Process
» Algorithms The Programming Process
» Pseudocode and Code Comments
» Representing Sequence Data Sequences and Strings
» Control Flow Comments Revisited Command Interpretation
» Assignment Print Exit Statements
» Concatenating DNA Fragments Sequences and Strings
» Using the Perl Documentation
» Calculating the Reverse Complement in Perl
» Proteins, Files, and Arrays Reading Proteins in Files
» Arrays Sequences and Strings
» Scalar and List Context Exercises
» Conditional tests and matching braces
» Code Layout Motifs and Loops
» Getting User Input from the Keyboard Turning Arrays into Scalars with join
» Regular expressions and character classes
» Counting Nucleotides Motifs and Loops
» Exploding Strings into Arrays
» Operating on Strings Motifs and Loops
» Writing to Files Motifs and Loops
» Advantages of Subroutines Subroutines
» Arguments Scoping and Subroutines
» Scoping Scoping and Subroutines
» Command-Line Arguments and Arrays
» Subroutines: Pass by Value Subroutines: Pass by Reference
» Modules and Libraries of Subroutines
» use warnings; and use strict; Fixing Bugs with Comments and Print Statements
» How to start and stop the debugger Debugger command summary
» Stepping through statements with the debugger
» Setting breakpoints The Perl Debugger
» Fixing another bug use warnings; and use strict; redux
» Exercises Subroutines and Bugs
» Random Number Generators Mutations and Randomization
» Seeding the Random Number Generator Control Flow
» Making a Sentence Randomly Selecting an Element of an Array
» Formatting A Program Using Randomization
» Select a random position in a string Choose a random nucleotide
» Improving the Design Combining the Subroutines to Simulate Mutation
» Exercises Mutations and Randomization
» A Gene Expression Database Gene Expression Data Using Unsorted Arrays
» Gene Expression Data Using Sorted Arrays and Binary Search
» Gene Expression Data Using Hashes
» Translating Codons to Amino Acids
» The Redundancy of the Genetic Code
» Using Hashes for the Genetic Code
» Translating DNA into Proteins
» FASTA Format Reading DNA from Files in FASTA Format
» A Design to Read FASTA Files
» A Subroutine to Read FASTA Files
» Writing Formatted Sequence Data
» Regular Expressions Restriction Maps and Regular Expressions
» Background Planning the Program
» Restriction Enzyme Data Restriction Maps and Restriction Enzymes
» Logical Operators and the Range Operator
» Finding the Restriction Sites
» Exercises Restriction Maps and Regular Expressions
» Using Arrays Separating Sequence and Annotation
» Pattern modifiers Examples of pattern modifiers
» Separating annotations from sequence
» Using Arrays Parsing Annotations
» When to Use Regular Expressions
» Main Program Parsing Annotations
» Parsing Annotations at the Top Level
» Features Parsing the FEATURES Table
» Parsing Parsing the FEATURES Table
» DBM Essentials Indexing GenBank with DBM
» Overview of PDB Protein Data Bank
» Opening Directories Files and Folders
» Processing Many Files Files and Folders
» Extracting Primary Sequence Parsing PDB Files
» Finding Atomic Coordinates Parsing PDB Files
» The Stride Secondary Structure Predictor
» Parsing Stride Output Controlling Other Programs
» String Matching and Homology
» Extracting Annotation and Alignments
» Parsing BLAST Alignments Parsing BLAST Output
» The printf Function Presenting Data
» here Documents format and write
» Bioperl Tutorial Script Bioperl
» The Art of Program Design Web Programming
» Algorithms and Sequence Alignment Object-Oriented Programming Complex Data Structures
Show more