Hashes The Genetic Code
Chapter 8. The Genetic Code
Up to this point weve used Perl to search for motifs, simulate DNA mutations, generate random sequences, and transcribe DNA to RNA. These are all important activities, and they serve as a good introduction to the computational techniques you can use to study biological systems. In this chapter, well write Perl programs to simulate how the genetic code directs the translation of DNA into protein. I will start by introducing the hash datatype. Then, after a brief discussion of how different data structures hashes, arrays, and databases can store and access experimental information, we will write a program to translate DNA to protein. Well also continue exploring regular expressions and write code to handle FASTA files.8.1 Hashes
There are three main datatypes in Perl. Youve already seen two: scalar variables and arrays. Now well start to use the third: hashes also called associative arrays. A hash provides very fast lookup of the value associated with a key. As an example, say you have a hash called english_dictionary . Yes, hashes start with the percent sign. If you want to look up the definition of the word recreant, you say: definition = english_dictionary{recreant}; The scalar recreant is the key, and the scalar definition thats returned is the value. As you see from this example, hashes like arrays change their leading character to a dollar sign when you access a single element, because the value returned from a hash lookup is a scalar value. You can tell a hash lookup from an array element by the type of braces they use: arrays use square brackets [ ]; hashes use curly braces { }. If you want to assign a value to a key, its similarly an easy, single statement: english_dictionary{recreant} = One who calls out in surrender.; Also, if you want to initialize a hash with some key-value pairs, its done much like initializing arrays, but every pair becomes a key-value: classification = dog, mammal, robin, bird, asp, reptile, ; which initializes the key dog with the value mammal , and so on. Theres another way of writing this, which shows whats happening a little more clearly. The following IT-SC 170 does exactly the same thing as the preceding code, while showing the key-value relationship more clearly: classification = dog = mammal, robin = bird, asp, = reptile, ; You can get an array of all the keys of a hash: keys = keys my_hash; You can get an array of all the values of a hash: values = values my_hash; You use hashes in lots of different situations, especially when your data is in the form of key-value or you need to look up the value of a key fast. For instance, later in this chapter, well develop programs that use hashes to retrieve information about a gene. The gene name is the key; the information about the gene is the value of that key. Mathematically, a Perl hash always represents a finite function. The name hash comes from something called a hash function, which practically any book on algorithms will define, if youve a mind to look it up. Lets skip the details of how they work under the hood and just talk about their behavior.8.2 Data Structures and Algorithms for Biology
Parts
» OReilly.Beginning.Perl For Bioinformatics
» The Organization of Proteins
» In Silico Biology and Computer Science
» A Low and Long Learning Curve
» Ease of Programming Rapid Prototyping
» Portability, Speed, and Program Maintenance
» Perl May Already Be Installed No Internet Access?
» Downloading Binary Versus Source Code
» Unix and Linux Macintosh Windows
» Unix or Linux How to Run Perl Programs
» Text Editors Getting Started with Perl
» Finding Help Getting Started with Perl
» Saves and Backups Error Messages
» Individual Approaches to Programming Programming Strategies
» The Design Phase The Programming Process
» Algorithms The Programming Process
» Pseudocode and Code Comments
» Representing Sequence Data Sequences and Strings
» Control Flow Comments Revisited Command Interpretation
» Assignment Print Exit Statements
» Concatenating DNA Fragments Sequences and Strings
» Using the Perl Documentation
» Calculating the Reverse Complement in Perl
» Proteins, Files, and Arrays Reading Proteins in Files
» Arrays Sequences and Strings
» Scalar and List Context Exercises
» Conditional tests and matching braces
» Code Layout Motifs and Loops
» Getting User Input from the Keyboard Turning Arrays into Scalars with join
» Regular expressions and character classes
» Counting Nucleotides Motifs and Loops
» Exploding Strings into Arrays
» Operating on Strings Motifs and Loops
» Writing to Files Motifs and Loops
» Advantages of Subroutines Subroutines
» Arguments Scoping and Subroutines
» Scoping Scoping and Subroutines
» Command-Line Arguments and Arrays
» Subroutines: Pass by Value Subroutines: Pass by Reference
» Modules and Libraries of Subroutines
» use warnings; and use strict; Fixing Bugs with Comments and Print Statements
» How to start and stop the debugger Debugger command summary
» Stepping through statements with the debugger
» Setting breakpoints The Perl Debugger
» Fixing another bug use warnings; and use strict; redux
» Exercises Subroutines and Bugs
» Random Number Generators Mutations and Randomization
» Seeding the Random Number Generator Control Flow
» Making a Sentence Randomly Selecting an Element of an Array
» Formatting A Program Using Randomization
» Select a random position in a string Choose a random nucleotide
» Improving the Design Combining the Subroutines to Simulate Mutation
» Exercises Mutations and Randomization
» A Gene Expression Database Gene Expression Data Using Unsorted Arrays
» Gene Expression Data Using Sorted Arrays and Binary Search
» Gene Expression Data Using Hashes
» Translating Codons to Amino Acids
» The Redundancy of the Genetic Code
» Using Hashes for the Genetic Code
» Translating DNA into Proteins
» FASTA Format Reading DNA from Files in FASTA Format
» A Design to Read FASTA Files
» A Subroutine to Read FASTA Files
» Writing Formatted Sequence Data
» Regular Expressions Restriction Maps and Regular Expressions
» Background Planning the Program
» Restriction Enzyme Data Restriction Maps and Restriction Enzymes
» Logical Operators and the Range Operator
» Finding the Restriction Sites
» Exercises Restriction Maps and Regular Expressions
» Using Arrays Separating Sequence and Annotation
» Pattern modifiers Examples of pattern modifiers
» Separating annotations from sequence
» Using Arrays Parsing Annotations
» When to Use Regular Expressions
» Main Program Parsing Annotations
» Parsing Annotations at the Top Level
» Features Parsing the FEATURES Table
» Parsing Parsing the FEATURES Table
» DBM Essentials Indexing GenBank with DBM
» Overview of PDB Protein Data Bank
» Opening Directories Files and Folders
» Processing Many Files Files and Folders
» Extracting Primary Sequence Parsing PDB Files
» Finding Atomic Coordinates Parsing PDB Files
» The Stride Secondary Structure Predictor
» Parsing Stride Output Controlling Other Programs
» String Matching and Homology
» Extracting Annotation and Alignments
» Parsing BLAST Alignments Parsing BLAST Output
» The printf Function Presenting Data
» here Documents format and write
» Bioperl Tutorial Script Bioperl
» The Art of Program Design Web Programming
» Algorithms and Sequence Alignment Object-Oriented Programming Complex Data Structures
Show more