Gene Expression Data Using Sorted Arrays and Binary Search
8.2.3 Gene Expression Data Using Sorted Arrays and Binary Search
You might try storing all the gene names in alphabetical order in the array and then use the following search technique. First, look at the middle element. You can tell the size of the array, as weve seen, with the expression scalar array . If your gene name is before that middle element alphabetically, you ignore the second half of the array and pick the middle element of the remaining half of the array. You continue, at each step narrowing the search to half the previous number of elements, until you find a match or discover there is none. Here it is in pseudocode: Given a sorted array, and an element: Until you find the element or discover its not there, Pick the midpoint of the array, array[scalararray2] Compare your element with the element at the midpoint If that matches your element, youre done. Else, ignore the half of the array that your element is not in } To compare two strings alphabetically in Perl, you use the cmp operator, which returns 0 if the two strings are the same, -1 if they are in alphabetical order, and 1 if they are in reverse alphabetical order. For example, the following returns 0: ZZZ cmp ZZZ; This returns -1: AAA cmp ZZZ; IT-SC 173 Finally, this returns 1: ZZZ cmp AAA; This algorithm is called a binary search, and it considerably speeds up the process of searching in an array, for example, to search 30,000 genes takes only about 15 times through the loop, maximum. As compared to 15,000 comparisons, average, for the unsorted array. Of course, you also have to sort the list, which might take awhile. If you need to keep adding elements, you have to either insert them in the right place or add them to the end and sort the array again. All that inserting or sorting might slow things down considerably. But if youre just sorting it once and then doing lots of lookups, a binary search might be worth doing. While were at it, lets look at how to sort an array. Heres how to sort an array of strings alphabetically: array = sort array; Heres how to sort an array of numbers in ascending order: array = sort { a = b } array; Many other kinds of sorting can be done, but these are the most common. For more details, see the Perl documentation for the sort function.8.2.4 Gene Expression Data Using Hashes
Parts
» OReilly.Beginning.Perl For Bioinformatics
» The Organization of Proteins
» In Silico Biology and Computer Science
» A Low and Long Learning Curve
» Ease of Programming Rapid Prototyping
» Portability, Speed, and Program Maintenance
» Perl May Already Be Installed No Internet Access?
» Downloading Binary Versus Source Code
» Unix and Linux Macintosh Windows
» Unix or Linux How to Run Perl Programs
» Text Editors Getting Started with Perl
» Finding Help Getting Started with Perl
» Saves and Backups Error Messages
» Individual Approaches to Programming Programming Strategies
» The Design Phase The Programming Process
» Algorithms The Programming Process
» Pseudocode and Code Comments
» Representing Sequence Data Sequences and Strings
» Control Flow Comments Revisited Command Interpretation
» Assignment Print Exit Statements
» Concatenating DNA Fragments Sequences and Strings
» Using the Perl Documentation
» Calculating the Reverse Complement in Perl
» Proteins, Files, and Arrays Reading Proteins in Files
» Arrays Sequences and Strings
» Scalar and List Context Exercises
» Conditional tests and matching braces
» Code Layout Motifs and Loops
» Getting User Input from the Keyboard Turning Arrays into Scalars with join
» Regular expressions and character classes
» Counting Nucleotides Motifs and Loops
» Exploding Strings into Arrays
» Operating on Strings Motifs and Loops
» Writing to Files Motifs and Loops
» Advantages of Subroutines Subroutines
» Arguments Scoping and Subroutines
» Scoping Scoping and Subroutines
» Command-Line Arguments and Arrays
» Subroutines: Pass by Value Subroutines: Pass by Reference
» Modules and Libraries of Subroutines
» use warnings; and use strict; Fixing Bugs with Comments and Print Statements
» How to start and stop the debugger Debugger command summary
» Stepping through statements with the debugger
» Setting breakpoints The Perl Debugger
» Fixing another bug use warnings; and use strict; redux
» Exercises Subroutines and Bugs
» Random Number Generators Mutations and Randomization
» Seeding the Random Number Generator Control Flow
» Making a Sentence Randomly Selecting an Element of an Array
» Formatting A Program Using Randomization
» Select a random position in a string Choose a random nucleotide
» Improving the Design Combining the Subroutines to Simulate Mutation
» Exercises Mutations and Randomization
» A Gene Expression Database Gene Expression Data Using Unsorted Arrays
» Gene Expression Data Using Sorted Arrays and Binary Search
» Gene Expression Data Using Hashes
» Translating Codons to Amino Acids
» The Redundancy of the Genetic Code
» Using Hashes for the Genetic Code
» Translating DNA into Proteins
» FASTA Format Reading DNA from Files in FASTA Format
» A Design to Read FASTA Files
» A Subroutine to Read FASTA Files
» Writing Formatted Sequence Data
» Regular Expressions Restriction Maps and Regular Expressions
» Background Planning the Program
» Restriction Enzyme Data Restriction Maps and Restriction Enzymes
» Logical Operators and the Range Operator
» Finding the Restriction Sites
» Exercises Restriction Maps and Regular Expressions
» Using Arrays Separating Sequence and Annotation
» Pattern modifiers Examples of pattern modifiers
» Separating annotations from sequence
» Using Arrays Parsing Annotations
» When to Use Regular Expressions
» Main Program Parsing Annotations
» Parsing Annotations at the Top Level
» Features Parsing the FEATURES Table
» Parsing Parsing the FEATURES Table
» DBM Essentials Indexing GenBank with DBM
» Overview of PDB Protein Data Bank
» Opening Directories Files and Folders
» Processing Many Files Files and Folders
» Extracting Primary Sequence Parsing PDB Files
» Finding Atomic Coordinates Parsing PDB Files
» The Stride Secondary Structure Predictor
» Parsing Stride Output Controlling Other Programs
» String Matching and Homology
» Extracting Annotation and Alignments
» Parsing BLAST Alignments Parsing BLAST Output
» The printf Function Presenting Data
» here Documents format and write
» Bioperl Tutorial Script Bioperl
» The Art of Program Design Web Programming
» Algorithms and Sequence Alignment Object-Oriented Programming Complex Data Structures
Show more