Background The Genetic Code
8.2.6 DBM
Perl has a simple, built-in way to store hash data, called database management DBM. Its simple to use: after starting up, it ties a hash to a file on your computer disk, so you can save a hash to reuse at a later date. This is, in effect, a simple and very useful database. Apart from the initialization, you use it as you would any other hash. You can store your genes and expression data in a DBM file and then use it as a hash. Theres more on DBM in Chapter 108.3 The Genetic Code
The genetic code is how a cell translates the information contained in its DNA into amino acids and then proteins, which do the real work in the cell.8.3.1 Background
Herein is a short introduction for the nonbiologists. As stated earlier, DNA encodes the primary structure i.e., the amino acid sequence of proteins. DNA has four nucleotides, and proteins have 20 amino acids. The encoding works by taking each group of three nucleotides from the DNA and translating them to an amino acid or a stop signal. Each group of three nucleotides is called a codon. Well see in detail how this coding and translation works. Actually, transcription first uses DNA to make RNA, and then translation uses RNA to make proteins. This is called the central dogma of molecular biology. But in this course, Ill abbreviate the process and somewhat inaccurately call the entire process from DNA to protein translation. The reason for this cavalier distinction is that the whole business is much easier to simulate on computer using strings to represent the DNA, RNA, and proteins. In fact, as shown in Chapter 4 , transcribing DNA to RNA is very easy indeed. In your computer simulations, you can simply skip that step, since its just a matter of changing one letter to another. The actual process in the cell, of course, is much more complex. IT-SC 176 Note that with four kinds of bases, each group of three bases of DNA can represent as many as 4 x 4 x 4 = 64 possible amino acids. Since there are only 20 amino acids plus a stop signal, the genetic code has evolved some redundancy, so that some amino acids are represented by more than one codon. Every possible three bases of DNA—each codon— represents some amino acid apart from the three codons that represent a stop signal. The chart in Figure 8-1 shows how the various bases combine to form amino acids. There are many interesting things to note about the genetic code. For our purposes, the most important is redundancy—the way more than one codon translates to the same amino acid. Well program this using character classes and regular expressions, as youll soon see. [2] [2] Also note that the genetic code in Figure 8-1 is properly based on RNA, where uracil appears instead of thymine. In our programs, were going to go directly from DNA to amino acids, so our codons will use thymine instead of uracil. Figure 8-1. The genetic code The machinery of the cell actually starts at some point along the RNA and reads the sequences codon after codon, attaching the encoded amino acid to the end of the growing protein sequence. Example 8-1 simulates this, reading the string of DNA three bases at a time and concatenating the symbol for the encoded amino acid to the end of the growing protein string. In the cell, the process stops when a codon is encountered.8.3.2 Translating Codons to Amino Acids
Parts
» OReilly.Beginning.Perl For Bioinformatics
» The Organization of Proteins
» In Silico Biology and Computer Science
» A Low and Long Learning Curve
» Ease of Programming Rapid Prototyping
» Portability, Speed, and Program Maintenance
» Perl May Already Be Installed No Internet Access?
» Downloading Binary Versus Source Code
» Unix and Linux Macintosh Windows
» Unix or Linux How to Run Perl Programs
» Text Editors Getting Started with Perl
» Finding Help Getting Started with Perl
» Saves and Backups Error Messages
» Individual Approaches to Programming Programming Strategies
» The Design Phase The Programming Process
» Algorithms The Programming Process
» Pseudocode and Code Comments
» Representing Sequence Data Sequences and Strings
» Control Flow Comments Revisited Command Interpretation
» Assignment Print Exit Statements
» Concatenating DNA Fragments Sequences and Strings
» Using the Perl Documentation
» Calculating the Reverse Complement in Perl
» Proteins, Files, and Arrays Reading Proteins in Files
» Arrays Sequences and Strings
» Scalar and List Context Exercises
» Conditional tests and matching braces
» Code Layout Motifs and Loops
» Getting User Input from the Keyboard Turning Arrays into Scalars with join
» Regular expressions and character classes
» Counting Nucleotides Motifs and Loops
» Exploding Strings into Arrays
» Operating on Strings Motifs and Loops
» Writing to Files Motifs and Loops
» Advantages of Subroutines Subroutines
» Arguments Scoping and Subroutines
» Scoping Scoping and Subroutines
» Command-Line Arguments and Arrays
» Subroutines: Pass by Value Subroutines: Pass by Reference
» Modules and Libraries of Subroutines
» use warnings; and use strict; Fixing Bugs with Comments and Print Statements
» How to start and stop the debugger Debugger command summary
» Stepping through statements with the debugger
» Setting breakpoints The Perl Debugger
» Fixing another bug use warnings; and use strict; redux
» Exercises Subroutines and Bugs
» Random Number Generators Mutations and Randomization
» Seeding the Random Number Generator Control Flow
» Making a Sentence Randomly Selecting an Element of an Array
» Formatting A Program Using Randomization
» Select a random position in a string Choose a random nucleotide
» Improving the Design Combining the Subroutines to Simulate Mutation
» Exercises Mutations and Randomization
» A Gene Expression Database Gene Expression Data Using Unsorted Arrays
» Gene Expression Data Using Sorted Arrays and Binary Search
» Gene Expression Data Using Hashes
» Translating Codons to Amino Acids
» The Redundancy of the Genetic Code
» Using Hashes for the Genetic Code
» Translating DNA into Proteins
» FASTA Format Reading DNA from Files in FASTA Format
» A Design to Read FASTA Files
» A Subroutine to Read FASTA Files
» Writing Formatted Sequence Data
» Regular Expressions Restriction Maps and Regular Expressions
» Background Planning the Program
» Restriction Enzyme Data Restriction Maps and Restriction Enzymes
» Logical Operators and the Range Operator
» Finding the Restriction Sites
» Exercises Restriction Maps and Regular Expressions
» Using Arrays Separating Sequence and Annotation
» Pattern modifiers Examples of pattern modifiers
» Separating annotations from sequence
» Using Arrays Parsing Annotations
» When to Use Regular Expressions
» Main Program Parsing Annotations
» Parsing Annotations at the Top Level
» Features Parsing the FEATURES Table
» Parsing Parsing the FEATURES Table
» DBM Essentials Indexing GenBank with DBM
» Overview of PDB Protein Data Bank
» Opening Directories Files and Folders
» Processing Many Files Files and Folders
» Extracting Primary Sequence Parsing PDB Files
» Finding Atomic Coordinates Parsing PDB Files
» The Stride Secondary Structure Predictor
» Parsing Stride Output Controlling Other Programs
» String Matching and Homology
» Extracting Annotation and Alignments
» Parsing BLAST Alignments Parsing BLAST Output
» The printf Function Presenting Data
» here Documents format and write
» Bioperl Tutorial Script Bioperl
» The Art of Program Design Web Programming
» Algorithms and Sequence Alignment Object-Oriented Programming Complex Data Structures
Show more