Exercises The Genetic Code

IT-SC 204 GPHLRVAQVWL_PQEAP_LLMTPLPPHLHGCALTSVMAVAAVGWAVAGGALAGTLRASL VRARKGSTCTI PGPAAGTGAAGTSAGSCWGPRTSSCPDRNHSDHSPQCADMPHTHHTCGLTV_SAAAAAG DAGWVWPPRAA ERICGAKQSPEQAWPQPLSLTLPGAAGLDQGQASCALHPHPGAHCCPAHCHPAPVTSCA DSESLAWGLSL CTPDSTTPGWPWPSSQ_SGCSPHGTTHCSCHTRS_SS_CPVCGRCSRWAHSPHSRTCCP PRHLEALGLNH LPPYLRSRRTPSRPPPATREPAQTTRRRPRRLQRRPQLLPLLVTPPRQRTPIAQAPCVV SAANQ_VAGLE PPRPLSAAI

8.7 Exercises

Exercise 8.1 Write a subroutine that checks a string and returns true if its a DNA sequence. Write another that checks for protein sequence data. Exercise 8.2 Write a program that can search by name for a gene in an unsorted array. Exercise 8.3 Write a program that can search by name for a gene in a sorted array; use the Perl sort function to sort an array. For extra credit: write a binary search subroutine to do the searching. Exercise 8.4 Write a subroutine that inserts an element into a sorted array. Hint: use the splice Perl function to insert the element, as shown in Chapter 4 . Exercise 8.5 Write a program that searches by name for a gene in a hash. Get the genes from your own work or try downloading a list of all genes for a given organism from www.ncbi.nlm.nih.gov or one of the web sites given in Appendix A . Make a hash of all the genes key=name, value=gene ID or sequence. Hint: you may have to write a short Perl program to reformat the list of genes you start with to make it easy to populate the Perl hash. Exercise 8.6 Write a subroutine that checks an array of data and returns true if its in FASTA format. Note that FASTA expects the standard IUBIUPAC amino acid and nucleic acid codes, plus the dash - that represents a gap of unknown length. Also, the asterisk represents a stop codon for amino acids. Be careful using an asterisk in regular expressions; use a \ to escape it to match an actual asterisk. The remaining problems deal with the effect of mutations in DNA on the proteins they IT-SC 205 encode. They combine the subject of randomization and mutations from Chapter 7 plus the subject of the genetic code from this chapter. Exercise 8.7 For each codon, make note of what effect single nucleotide mutations have on the codon: does the same amino acid result, or does the codon now encode a different amino acid? Which one? Write a subroutine that, given a codon, returns a list of all the amino acids that may result from any single mutation in the codon. Exercise 8.8 Write a subroutine that, given an amino acid, randomly changes it to one of the amino acids calculated in Exercise 8.7. Exercise 8.9 Write a program that randomly mutates the amino acids in a protein but restricts the possibilities to those that can occur due to a single mutation in the original codons, as in Exercises 8.7 and 8.8. Exercise 8.10 Some codons are more likely than others to occur in random DNA. For instance, there are 6 of the 64 possible codons that code for the amino acid serine, but only 2 of the 64 codes for phenylalanine. Write a subroutine that, given an amino acid, returns the probability that its coded by a randomly generated codon see Chapter 7 . Exercise 8.11 Write a subroutine that takes as arguments an amino acid; a position 1, 2, or 3; and a nucleotide. It then takes each codon that encodes the specified amino acid there may be from one to six such codons, and mutates it at the specified position to the specified nucleotide. Finally, it returns the set of amino acids that are encoded by the mutated codons. Exercise 8.12 Write a program that, given two amino acids, returns the probability that a single mutation in their underlying but unspecified codons results in the codon of one amino acid mutating to the codon of the other amino acid. IT-SC 206

Chapter 9. Restriction Maps and Regular Expressions

In this chapter, Ill give an overview of Perl regular expressions and Perl operators, two essential features of the language weve been using all along. Well also investigate the programming of a standard, fundamental molecular-biology technique: the discovery of a restriction map for a sequence. Restriction digests were one of the original ways to fingerprint DNA; this can now be simulated on the computer. Restriction maps and their associated restriction digests are common calculations in the laboratory and are provided by several software packages. They are essential tools in the planning of cloning experiments; they can be used to insert a desired stretch of DNA into a cloning vector, for instance. Restriction maps also find application in sequencing projects, for instance in shotgun or directed sequencing.

9.1 Regular Expressions

Weve been dealing with regular expressions for a while now. This section fills in some background an.d ties together the somewhat scattered discussions of regular expressions from earlier parts of the book. Regular expressions are interesting, important, and rich in capabilities. Jeffrey Friedls book Mastering Regular Expressions OReilly is entirely devoted to them. Perl makes particularly good use of regular expressions, and the Perl documentation explains them well. Regular expressions are useful when programming with biological data such as sequence, or with GenBank, PDB, and BLAST files. Regular expressions are ways of representing—and searching for—many strings with one string. Although they are not strictly the same thing, its useful to think of regular expressions as a kind of highly developed set of wildcards. The special characters in regular expressions are more properly known as metacharacters. Most people are familiar with wildcards, which are found in search engines or in the game of poker. You might find the reference to every word that starts with biolog by typing biolog , for instance. Or you may find yourself holding five aces. Different situations may use different wildcards. Perl regular expressions use to mean 0 or more of the preceding item, not followed by anything as in the wildcard example just given. In computer science, these kinds of wildcards or metacharacters have an important history, both practically and theoretically. The asterisk character in particular is called the Kleene closure after the eminent logician who invented it. As a nod to the theory, Ill mention there is a simple model of a computer, less powerful than a Turing machine, that can deal with exactly the same kinds of languages that can be described by regular expressions. This machine model is called a finite state automaton. But enough theory for

Exercises The Genetic Code

8.7 Exercises

Chapter 9. Restriction Maps and Regular Expressions

9.1 Regular Expressions

Parts

Dokumen yang terkait

medinfo 04 bioinformatics

Bioinformatics Education in Greece: A Survey

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

Pengembangan Database Genbank UAI-Bioinformatics Menggunakan Sistem Terdistribusi

Applied Statistics for Bioinformatics using R

Big Data Analysis for Bioinformatics and Biomedical Discoveries pdf pdf

A Bioinformatics Workflow for Genetic Association Studies of Traits in Indonesian Rice

[Michael Moorhouse, Paul Barry,] Bioinformatics Bi(BookFi.org)

Wiley Bioinformatics Biocomputing And Perl An Introduction To Bioinformatics Computing Skills And Practice Jul 2004 ISBN 047085331X pdf

Dukungan

Links

Exercises The Genetic Code

8.7 Exercises

Chapter 9. Restriction Maps and Regular Expressions

9.1 Regular Expressions

Parts

Dokumen yang terkait

medinfo 04 bioinformatics

Bioinformatics Education in Greece: A Survey

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

Pengembangan Database Genbank UAI-Bioinformatics Menggunakan Sistem Terdistribusi

Applied Statistics for Bioinformatics using R

Big Data Analysis for Bioinformatics and Biomedical Discoveries pdf pdf

A Bioinformatics Workflow for Genetic Association Studies of Traits in Indonesian Rice

[Michael Moorhouse, Paul Barry,] Bioinformatics Bi(BookFi.org)

Wiley Bioinformatics Biocomputing And Perl An Introduction To Bioinformatics Computing Skills And Practice Jul 2004 ISBN 047085331X pdf

Dokumen yang Anda mencari sudah siap untuk unduhkan