Exercises Protein Data Bank

IT-SC 309 Next, you want to save just those positions or columns of these lines that have the sequence or structure information; you dont need the keywords, position numbers, or the PDB entry name at the end of the lines. Finally, join the arrays into single strings. Here, theres one detail to handle; you need to remove any unneeded spaces from the ends of the strings. Notice that stride sometimes leaves spaces in the structure prediction, and in this example, has left some at the end of the structure prediction. So you shouldnt throw away all the spaces at the ends of the strings. Instead, throw away all the spaces at the end of the sequence string, because they are just superfluous spaces on the line. Now, see how many spaces that was, and throw the equal amount away at the end of the structure prediction string, thus preserving spaces that correspond to undetermined secondary structure. Example 11-7 contains a main program that calls two subroutines, which, since they are short, are all included so theres no need here for the BeginPerlBioinfo module. Heres the output of Example 11-7 : GGLQVKNFDFTVGKFLTVGGFINNSPQRFSVNVGESMNSLSLHLDHRFNYGADQNTIVM NSTLKGDNGWETEQRSTNFTL TTTTTTBTTT EEEEEEETTTT EEEEEEEEETTEEEEEEEEEEEETTEEEEEEEEEETTGGG B EEE The first line shows the amino acids, and the second line shows the prediction of the secondary structure. Check the next section for a subroutine that will improve that output.

11.6 Exercises

Exercise 11.1 Use File::Find and the file test operators to find the oldest and largest files on the hard drive of your computer. You can delete them or store them elsewhere if youre running short on disk space. Exercise 11.2 Find all the Perl programs on your computer. Hint: Use File::Find. What do all Perl programs have in common? Exercise 11.3 Parse the HEADER, TITLE, and KEYWORDS record types of all PDB files on your computer. Make a hash with key as a word from those record types and value as a list of filenames that contained that word. Save it as a DBM file and build a query program for it. In the end, you should be able to ask for, say, sugar, and get a list of all PDB files that contain that word in the HEADER, TITLE, or KEYWORDS records. Exercise 11.4 Parse out the record types of a PDB file using regular expressions as used in Chapter 10 instead of iterating through an array of input lines as in this chapter. IT-SC 310 Exercise 11.5 Write a program that extracts the secondary structure information contained in the HELIX, SHEET, and TURN record types of PDB files. Print out the secondary structure and the primary sequence together, so that its easy to see by what secondary structure a given residue is included. Consider using a special alphabet for secondary structure, so that every residue in a helix is represented by H, for example. Exercise 11.6 Write a program that finds all PDB files under a given folder and runs a program such as stride, or the program you wrote in Exercise 11.5 that reports on the secondary structure of each PDB file. Store the results in a DBM file keyed on the filename. Exercise 11.7 Write a subroutine that, given two strings, prints them out one over the other, but with line breaks similar to the stride program output. Use this subroutine to print out the strings from Example 11-7 . Exercise 11-8 Write a recursive subroutine to determine the size of an array. You may want to use the pop or unshift functions. Ignore the fact that the scalar array returns the size of array Exercise 11.9 Write a recursive subroutine that extracts the primary amino acid sequence from the SEQRES record type of a PDB file. Exercise 11.10 Extra credit Given an atom and a distance, find all other atoms in a PDB file that are within that distance of the atom. Exercise 11.11 Extra credit Write a program to find some correlation between the primary amino acid sequence and the location of alpha helices. IT-SC 311

Chapter 12. BLAST

In biological research, the search for sequence similarity is very important. For instance, a researcher who has discovered a potentially important DNA or protein sequence wants to know if its already been identified and characterized by another researcher. If it hasnt, the researcher wants to know if it resembles any known sequence from any organism. This information can provide vital clues as to the role of the sequence in the organism. The Basic Local Alignment Search Tool BLAST is one of the most popular software tools in biological research. It tests a query sequence against a library of known sequences in order to find similarity. BLAST is actually a collection of programs with versions for query-to-database pairs such as nucleotide-nucleotide, protein-nucleotide, protein-protein, nucleotide-protein, and more. This chapter examines the output from the nucleotide-nucleotide version of the program, BLASTN . For simplicitys sake, Ill simply refer to it here as BLAST. The main goal of this chapter is to show how to write code to parse a BLAST output file using regular expressions. The code is simple and basic, but it does the job. Once you understand the basics, you can build more features into your parser or obtain one of the fancier BLAST output parsers thats available via the Web. In either case, youll know enough about output parsers to use or extend them. This chapter also gives you a brief introduction to Bioperl, which is a collection of Perl bioinformatics modules. The Bioperl project is an example of an open source project that you, the Perl bioinformatics programmer, can put to good use. The Perl programming language is itself an open source project. The program and its source code are available for use and modification with only very reasonable restrictions and at no cost.

12.1 Obtaining BLAST

There are a several implementations of BLAST. The most popular is probably the one offered free of charge by the National Center for Biotechnology Information NCBI: http:www.ncbi.nlm.nih.govBLAST . The NCBI web site features a publicly available BLAST server, a comprehensive set of databases, and a well-organized collection of documents and tutorials, in addition to the BLAST software available for downloading. Also popular is the WU-BLAST implementation from Washington University. The main web site, including a list of other WU-BLAST servers, can be found at http:blast.wustl.edu . Older versions of WU-BLAST are available at no charge. Newer versions are free if you qualify as a research or nonprofit organization and agree to the licensing arrangements from Washington University where the program is developed and maintained. If you work at a major research organization, you may already have a site license for the WU-BLAST program. If you are a for-profit company, there is a rather hefty charge for the newer WU-BLAST program older versions are freely

Exercises Protein Data Bank

11.6 Exercises

Chapter 12. BLAST

12.1 Obtaining BLAST

Parts

Dokumen yang terkait

medinfo 04 bioinformatics

Bioinformatics Education in Greece: A Survey

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

Pengembangan Database Genbank UAI-Bioinformatics Menggunakan Sistem Terdistribusi

Applied Statistics for Bioinformatics using R

Big Data Analysis for Bioinformatics and Biomedical Discoveries pdf pdf

A Bioinformatics Workflow for Genetic Association Studies of Traits in Indonesian Rice

[Michael Moorhouse, Paul Barry,] Bioinformatics Bi(BookFi.org)

Wiley Bioinformatics Biocomputing And Perl An Introduction To Bioinformatics Computing Skills And Practice Jul 2004 ISBN 047085331X pdf

Dukungan

Links

Exercises Protein Data Bank

11.6 Exercises

Chapter 12. BLAST

12.1 Obtaining BLAST

Parts

Dokumen yang terkait

medinfo 04 bioinformatics

Bioinformatics Education in Greece: A Survey

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

Pengembangan Database Genbank UAI-Bioinformatics Menggunakan Sistem Terdistribusi

Applied Statistics for Bioinformatics using R

Big Data Analysis for Bioinformatics and Biomedical Discoveries pdf pdf

A Bioinformatics Workflow for Genetic Association Studies of Traits in Indonesian Rice

[Michael Moorhouse, Paul Barry,] Bioinformatics Bi(BookFi.org)

Wiley Bioinformatics Biocomputing And Perl An Introduction To Bioinformatics Computing Skills And Practice Jul 2004 ISBN 047085331X pdf

Dokumen yang Anda mencari sudah siap untuk unduhkan