Gene Expression Data Using Hashes
8.2.4 Gene Expression Data Using Hashes
You can also use hashes to find a gene in your data. To do so, you can load the hash so that the keys are the gene names and the values are the expression measurement. Then a single call on the hash, with the name of the desired gene as a key, returns the results of the experiment for that gene, and youve got your answer. This process is also cleaner than storing the gene name and the expression result in one scalar string; here the key is a scalar, and the value is a separate scalar. Furthermore, due to how hashes are made, you get an answer back very quickly, because decent hashes dont have to search hard to find the value of a key. Using hashes is typically faster than binary searches. Plus, youd know if the gene being searched for was in the data, because you can explicitly ask if a hash value is defined by saying something like: if defined myhash{mykey} { ... } Also, youll get an error message if you have warnings turned on, and you refer to an undefined value. Another advantage of hashes over binary searching is that you can add or subtract elements to hashes without resorting the entire array. Finally, because hashes are built into Perl as a basic datatype, they are easy to use, and IT-SC 174 you wont have to do much programming to accomplish your goal. It is usually the case that its more important to save time writing a program then it is to save time running it. I mention this in Chapter 3 , but its worth emphasizing. To a programmer, the lazy way is often the most efficient way: let the machine do the work Dont get the idea that hashes are always the right way to go, however. For instance, they dont store their elements in a sorted order, so if you need to look at the data that way, you have to explicitly sort it, like so: sorted_keys = sort keys my_hash; This is do-able, but it can be a bit slow on a large array. You could also sort the values, of course. To conclude the discussion of data structures for our expression data example, heres an informal survey of the properties of some different data structures in Perl for searching, adding and deleting, and maintaining sorted order in a set of gene names: Use a hash if you just need to see if something is in a set and dont need to list the set in order. A sorted array combined with a binary search algorithm will do if you need an ordered set and pretty fast lookup and dont need to add or subtract elements very often. An array, in conjunction with the Perl functions push and pop, works well if you dont need to sort the elements but do need to quickly get at the most recently added element. A Perl array with the functions push and shift will serve if you dont need the elements sorted but need to add elements. Its especially useful to always remove the oldest element the element that has been in the array the longest. For more information, see Appendix A and especially Mastering Algorithms with Perl published by OReilly.8.2.5 Relational Databases
Parts
» OReilly.Beginning.Perl For Bioinformatics
» The Organization of Proteins
» In Silico Biology and Computer Science
» A Low and Long Learning Curve
» Ease of Programming Rapid Prototyping
» Portability, Speed, and Program Maintenance
» Perl May Already Be Installed No Internet Access?
» Downloading Binary Versus Source Code
» Unix and Linux Macintosh Windows
» Unix or Linux How to Run Perl Programs
» Text Editors Getting Started with Perl
» Finding Help Getting Started with Perl
» Saves and Backups Error Messages
» Individual Approaches to Programming Programming Strategies
» The Design Phase The Programming Process
» Algorithms The Programming Process
» Pseudocode and Code Comments
» Representing Sequence Data Sequences and Strings
» Control Flow Comments Revisited Command Interpretation
» Assignment Print Exit Statements
» Concatenating DNA Fragments Sequences and Strings
» Using the Perl Documentation
» Calculating the Reverse Complement in Perl
» Proteins, Files, and Arrays Reading Proteins in Files
» Arrays Sequences and Strings
» Scalar and List Context Exercises
» Conditional tests and matching braces
» Code Layout Motifs and Loops
» Getting User Input from the Keyboard Turning Arrays into Scalars with join
» Regular expressions and character classes
» Counting Nucleotides Motifs and Loops
» Exploding Strings into Arrays
» Operating on Strings Motifs and Loops
» Writing to Files Motifs and Loops
» Advantages of Subroutines Subroutines
» Arguments Scoping and Subroutines
» Scoping Scoping and Subroutines
» Command-Line Arguments and Arrays
» Subroutines: Pass by Value Subroutines: Pass by Reference
» Modules and Libraries of Subroutines
» use warnings; and use strict; Fixing Bugs with Comments and Print Statements
» How to start and stop the debugger Debugger command summary
» Stepping through statements with the debugger
» Setting breakpoints The Perl Debugger
» Fixing another bug use warnings; and use strict; redux
» Exercises Subroutines and Bugs
» Random Number Generators Mutations and Randomization
» Seeding the Random Number Generator Control Flow
» Making a Sentence Randomly Selecting an Element of an Array
» Formatting A Program Using Randomization
» Select a random position in a string Choose a random nucleotide
» Improving the Design Combining the Subroutines to Simulate Mutation
» Exercises Mutations and Randomization
» A Gene Expression Database Gene Expression Data Using Unsorted Arrays
» Gene Expression Data Using Sorted Arrays and Binary Search
» Gene Expression Data Using Hashes
» Translating Codons to Amino Acids
» The Redundancy of the Genetic Code
» Using Hashes for the Genetic Code
» Translating DNA into Proteins
» FASTA Format Reading DNA from Files in FASTA Format
» A Design to Read FASTA Files
» A Subroutine to Read FASTA Files
» Writing Formatted Sequence Data
» Regular Expressions Restriction Maps and Regular Expressions
» Background Planning the Program
» Restriction Enzyme Data Restriction Maps and Restriction Enzymes
» Logical Operators and the Range Operator
» Finding the Restriction Sites
» Exercises Restriction Maps and Regular Expressions
» Using Arrays Separating Sequence and Annotation
» Pattern modifiers Examples of pattern modifiers
» Separating annotations from sequence
» Using Arrays Parsing Annotations
» When to Use Regular Expressions
» Main Program Parsing Annotations
» Parsing Annotations at the Top Level
» Features Parsing the FEATURES Table
» Parsing Parsing the FEATURES Table
» DBM Essentials Indexing GenBank with DBM
» Overview of PDB Protein Data Bank
» Opening Directories Files and Folders
» Processing Many Files Files and Folders
» Extracting Primary Sequence Parsing PDB Files
» Finding Atomic Coordinates Parsing PDB Files
» The Stride Secondary Structure Predictor
» Parsing Stride Output Controlling Other Programs
» String Matching and Homology
» Extracting Annotation and Alignments
» Parsing BLAST Alignments Parsing BLAST Output
» The printf Function Presenting Data
» here Documents format and write
» Bioperl Tutorial Script Bioperl
» The Art of Program Design Web Programming
» Algorithms and Sequence Alignment Object-Oriented Programming Complex Data Structures
Show more