Problem Solution Discussion Extracting and Rearranging Datafile Columns
10.20 Extracting and Rearranging Datafile Columns
10.20.1 Problem
You want t o pull out colum ns from a dat afile or rearrange t hem int o a different order.10.20.2 Solution
Use a ut ilit y t hat can produce colum ns from a file on dem and.10.20.3 Discussion
cvt _file.pl serves as a t ool t hat convert s ent ire files from one form at t o anot her. Anot her com m on dat afile operat ion is t o m anipulat e it s colum ns. This is necessary, for exam ple, when im port ing a file int o a program t hat doesnt underst and how t o ext ract or rearrange input colum ns for it self. Perhaps you want t o om it colum ns from t he m iddle of a file so you can use it wit h LOAD DATA , which cannot skip over colum ns in t he m iddle of dat a lines. Or perhaps you have a version of m ysqlim port older t han 3.23.17, which doesnt support t he - - colum ns opt ion t hat allows you t o indicat e t he order in which t able colum ns appear in t he file. To work around t hese problem s, you can rearrange t he dat afile inst ead. Recall t hat t his chapt er began wit h a descript ion of a scenario involving a 12- colum n CSV file som edat a.csv from which only colum ns 2, 11, 5, and 9 were needed. You can convert t he file t o t ab- delim it ed form at like t his: cvt_file.pl --iformat=csv somedata.csv somedata.txt But t hen what ? I f you j ust want t o knock out a short script t o ext ract t hose specific four colum ns, t hat s fairly easy: writ e a loop t hat reads input lines and writ es only t he colum ns you want in t he proper order. Assum ing input in t ab-delim it ed, linefeed- t erm inat ed form at , a sim ple Perl program t o pull out t he four colum ns can be writ t en like t his: usrbinperl -w yank_4col.pl - 4-column extraction example Extracts column 2, 11, 5, and 9 from 12-column input, in that order. Assumes tab-delimited, linefeed-terminated input lines. use strict; while { chomp; my in = split \t, _; split at tabs extract columns 2, 11, 5, and 9 print join \t, in[1], in[10], in[4], in[8] . \n; } exit 0; Run t he script as follows t o read t he file cont aining 12 dat a colum ns and writ e out put t hat cont ains only t he four colum ns in t he desired order: yank_4col.pl somedata.txt tmp But yank_4col.pl is a special purpose script , useful only wit hin a highly lim it ed cont ext . Wit h j ust a lit t le m ore work, it s possible t o writ e a m ore general ut ilit y yank_col.pl t hat allows any set of colum ns t o be ext ract ed. Wit h such a t ool, youd specify t he colum n list on t he com m and line like t his: yank_col.pl --columns=2,11,5,9 somedata.txt tmp Because t he script doesnt use a hardcoded colum n list , it can be used t o pull out an arbit rary set of colum ns in any order. Colum ns can be specified as a com m a-separat ed list of colum n num bers or colum n ranges. For exam ple, - - colum ns= 1,4- 7,10 m eans colum ns 1, 4, 5, 6, 7, and 10. yank_col.pl looks like t his: usrbinperl -w yank_col.pl - extract columns from input Example: yank_col.pl --columns=2,11,5,9 filename Assumes tab-delimited, linefeed-terminated input lines. use strict; use Getopt::Long; Getopt::Long::ignorecase = 0; options are case sensitive my prog = yank_col.pl; my usage = EOF; Usage: prog [options] [data_file] Options: --help Print this message --columns=column-list Specify columns to extract, as a comma-separated list of column positions EOF my help; my columns; GetOptions help = \help, print help message columns=s = \columns specify column list or die usage\n; die usage\n if defined help; my col_list = split ,, columns if defined columns; col_list or die usage\n; nonempty column list is required make sure column specifiers are positive integers, and convert from 1-based to 0-based values my tmp; for my i = 0; i col_list; i++ { if col_list[i] =~ \d+ single column number { die Column specifier col_list[i] is not a positive integer\n unless col_list[i] 0; push tmp, col_list[i] - 1; } elsif col_list[i] =~ \d+-\d+ column range m-n { my begin, end = 1, 2; die col_list[i] is not a valid column specifier\n unless begin 0 end 0 begin = end; while begin = end { push tmp, begin - 1; ++begin; } } else { die col_list[i] is not a valid column specifier\n; } } col_list = tmp; while read input { chomp; my val = split \t, _, 10000; split, preserving all fields extract desired columns, mapping undef to empty string can occur if an index exceeds number of columns present in line val = map { defined _ ? _ : } val[col_list]; print join \t, val . \n; } exit 0; The input processing loop convert s each line t o an array of values, t hen pulls out from t he array t he values corresponding t o t he request ed colum ns. To avoid looping t hough t he array, it uses Perls not at ion t hat allows a list of subscript s t o be specified all at once t o request m ult iple array elem ent s. For exam ple, if col_list cont ains t he values 2 , 6 , and 3 , t hese t wo expressions are equivalent : val[2] , val[6], val[3] val[col_list] What if you want t o ext ract colum ns from a file t hat s not in t ab- delim it ed form at , or produce out put in anot her form at ? I n t hat case, com bine yank_col.pl w it h cvt _file.pl. Suppose you want t o pull out all but t he password colum n from t he colon- delim it ed et c passwd file and writ e t he result in CSV form at . Use cvt _file.pl bot h t o preprocess et c passwd int o t ab- delim it ed form at for yank_col.pl and t o post -process t he ext ract ed colum ns int o CSV form at : cvt_file.pl --idelim=: etcpasswd \ | yank_col.pl --columns=1,3-7 \ | cvt_file.pl --oformat=csv passwd.csv I f you dont want t o t ype all of t hat as one long com m and, use t em porary files for t he int erm ediat e st eps: cvt_file.pl --idelim=: etcpasswd tmp1 yank_col.pl --columns=1,3-7 tmp1 tmp2 cvt_file.pl --oformat=csv tmp2 passwd.csv rm tmp1 tmp2 Forcing split to Return Every Field The Perl split funct ion is ext rem ely useful, but norm ally it doesnt ret urn t railing em pt y fields. This m eans t hat if you writ e out only as m any fields as split ret urns, out put lines m ay not have t he sam e num ber of fields as input lines. To avoid t his problem , pass a t hird argum ent t o indicat e t he m axim um num ber of fields t o ret urn. This forces split t o ret urn as m any fields as are act ually present on t he line, or t he num ber request ed, whichever is sm aller. I f t he value of t he t hird argum ent is large enough, t he pract ical effect is t o cause all fields t o be ret urned, em pt y or not . Script s shown in t his chapt er use a field count value of 10,000: split line at tabs, preserving all fields my val = split \t, _, 10000; I n t he unlikely? event t hat an input line has m ore fields t han t hat , it will be t runcat ed. I f you t hink t hat will be a problem , you can bum p up t he num ber even higher.10.21 Validating and Transforming Data
Parts
» O'Reilly-MySQL.Cookbook.eBook-iNTENSiTY. 4810KB Mar 29 2010 05:03:43 AM
» Introduction Using the mysql Client Program
» Problem Solution Discussion Setting Up a MySQL User Account
» Problem Solution Discussion Starting and Terminating mysql
» Problem Solution Discussion Specifying Connection Parameters by Using Option Files
» Problem Solution Discussion Mixing Command-Line and Option File Parameters
» Problem Solution Discussion What to Do if mysql Cannot Be Found
» Problem Solution Discussion Setting Environment Variables
» Problem Solution Discussion Repeating and Editing Queries
» Problem Solution Discussion Preventing Query Output from Scrolling off the Screen
» Problem Solution Discussion Specifying Arbitrary Output Column Delimiters
» Problem Solution Discussion Logging Interactive mysql Sessions
» Discussion Using mysql as a Calculator
» Writing Shell Scripts Under Unix
» Writing Shell Scripts Under Windows
» MySQL Client Application Programming Interfaces
» Perl Connecting to the MySQL Server, Selecting a Database, and Disconnecting
» PHP Connecting to the MySQL Server, Selecting a Database, and Disconnecting
» Python Connecting to the MySQL Server, Selecting a Database, and Disconnecting
» Java Connecting to the MySQL Server, Selecting a Database, and Disconnecting
» Problem Solution Discussion Checking for Errors
» Python Java Checking for Errors
» Problem Solution Discussion Writing Library Files
» Python Writing Library Files
» SQL Statement Categories Issuing Queries and Retrieving Results
» Perl Issuing Queries and Retrieving Results
» Python Issuing Queries and Retrieving Results
» Java Issuing Queries and Retrieving Results
» Problem Solution Discussion Moving Around Within a Result Set
» Problem Solution Discussion Using Prepared Statements and Placeholders in Queries
» Perl Using Prepared Statements and Placeholders in Queries
» PHP Python Java Using Prepared Statements and Placeholders in Queries
» Problem Solution Discussion Including Special Characters and NULL Values in Queries
» Perl Including Special Characters and NULL Values in Queries
» PHP Including Special Characters and NULL Values in Queries
» Python Java Including Special Characters and NULL Values in Queries
» PHP Python Java Handling NULL Values in Result Sets
» Problem Solution Discussion Writing an Object-Oriented MySQL Interface for PHP
» Class Overview Writing an Object-Oriented MySQL Interface for PHP
» Connecting and Disconnecting Writing an Object-Oriented MySQL Interface for PHP
» Error Handling Issuing Queries and Processing the Results
» Quoting and Placeholder Support
» Problem Solution Discussion Ways of Obtaining Connection Parameters
» Getting Parameters from the Command Line
» Getting Parameters from Option Files
» Conclusion and Words of Advice
» Problem Solution Discussion Avoiding Output Column Order Problems When Writing Programs
» Problem Solution Discussion Using Column Aliases to Make Programs Easier to Write
» Problem Solution Discussion Selecting a Result Set into an Existing Table
» Problem Solution Discussion Creating a Destination Table on the Fly from a Result Set
» Problem Solution Discussion Moving Records Between Tables Safely
» Problem Solution Discussion Cloning a Table Exactly
» Problem Solution Discussion Generating Unique Table Names
» Problem Solution Discussion Using TIMESTAMP Values
» Problem Solution Discussion Using ORDER BY to Sort Query Results
» Solution Discussion Working with Per-Group and Overall Summary Values Simultaneously
» Problem Solution Discussion Changing a Column Definition or Name
» Problem Solution Discussion Changing a Table Type
» Problem Solution Discussion Adding Indexes
» Introduction Obtaining and Using Metadata
» Problem Solution Discussion Perl PHP
» Problem Solution Discussion Perl
» PHP Obtaining Result Set Metadata
» Python Obtaining Result Set Metadata
» Java Obtaining Result Set Metadata
» Using Result Set Metadata to Get Table Structure
» Problem Solution Discussion Database-Independent Methods of Obtaining Table Information
» Problem Solution Discussion Displaying Column Lists Interactive Record Editing
» Mapping Column Types onto Web Page Elements Adding Elements to ENUM or SET Column Definitions
» Selecting All Except Certain Columns
» Problem Solution Discussion Listing Tables and Databases
» Problem Solution Writing Applications That Adapt to the MySQL Server Version
» Discussion Writing Applications That Adapt to the MySQL Server Version
» Problem Solution Discussion Determining Which Table Types the Server Supports
» General Import and Export Issues
» Problem Solution Discussion Importing Data with LOAD DATA and mysqlimport
» Problem Solution Discussion Specifying the Datafile Location
» Problem Solution Discussion Specifying the Datafile Format
» Problem Solution Discussion Dealing with Quotes and Special Characters
» Problem Solution Discussion Handling Duplicate Index Values
» Problem Solution Discussion Getting LOAD DATA to Cough Up More Information
» Problem Solution Discussion Dont Assume LOAD DATA Knows More than It Does
» Problem Solution Discussion Skipping Datafile Columns
» Problem Solution Discussion Exporting Query Results from MySQL
» Using the mysql Client to Export Data
» Problem Solution Discussion Exporting Tables as Raw Data
» Problem Solution Discussion Exporting Table Contents or Definitions in SQL Format
» Problem Solution Discussion Copying Tables or Databases to Another Server
» Problem Solution Discussion Writing Your Own Export Programs
» Problem Solution Discussion Converting Datafiles from One Format to Another
» Problem Solution Discussion Extracting and Rearranging Datafile Columns
» Problem Solution Discussion Validating and Transforming Data
» Writing an Input-Processing Loop Putting Common Tests in Libraries
» Problem Solution Discussion Validation by Pattern Matching
» Problem Solution Discussion Using Patterns to Match Numeric Values
» Problem Solution Discussion Using Patterns to Match Dates or Times
» See Also Using Patterns to Match Dates or Times
» Problem Solution Discussion Using Patterns to Match Email Addresses and URLs
» Problem Solution Discussion Validation Using Table Metadata
» Problem Solution Discussion Issue Individual Queries Construct a Hash from the Entire Lookup Table
» Use a Hash as a Cache of Already-Seen Lookup Values
» Problem Solution Discussion Converting Two-Digit Year Values to Four-Digit Form
» Problem Solution Discussion Performing Validity Checking on Date or Time Subparts
» Problem Solution Discussion Writing Date-Processing Utilities
» Problem Solution Discussion Performing Date Conversion Using SQL
» Problem Solution Discussion Guessing Table Structure from a Datafile
» Problem Solution Discussion A LOAD DATA Diagnostic Utility
» Problem Solution Discussion Exchanging Data Between MySQL and Microsoft Access
» Problem Solution Discussion Exchanging Data Between MySQL and Microsoft Excel
» Problem Solution Discussion Exchanging Data Between MySQL and FileMaker Pro
» Problem Solution Discussion Importing XML into MySQL
» Epilog Importing and Exporting Data
» Introduction Generating and Using Sequences
» Problem Solution Discussion Using AUTO_INCREMENT To Set Up a Sequence Column
» Problem Solution Discussion Choosing the Type for a Sequence Column
» Problem Solution Discussion Ensuring That Rows Are Renumbered in a Particular Order
» Problem Solution Discussion Managing Multiple Simultaneous AUTO_INCREMENT Values
» Problem Solution Discussion Using AUTO_INCREMENT Values to Relate Tables
» Problem Solution Discussion Generating Repeating Sequences
» Problem Solution Discussion See Also
» Performing a Related-Table Update Using Table Replacement
» Performing a Related-Table Update by Writing a Program
» Performing a Multiple-Table Delete by Writing a Program
» Problem Solution Discussion Dealing with Duplicates at Record-Creation Time
» Problem Solution Discussion Using Transactions in Perl Programs
» Problem Solution Discussion Using Transactions in Java Programs
» Problem Solution Discussion Using Alternatives to Transactions
» Grouping Statements Using Locks
» Rewriting Queries to Avoid Transactions
» Introduction Introduction to MySQL on the Web
» Problem Solution Discussion Basic Web Page Generation
» Problem Solution Discussion Using Apache to Run Web Scripts
» Problem Solution Discussion Using Tomcat to Run Web Scripts
» Installing the mcb Application
» Installing the JSTL Distribution
» Problem Solution Discussion Encoding Special Characters in Web Output
» General Encoding Principles Encoding Special Characters in Web Output
» Encoding Special Characters Using Web APIs
» Introduction Incorporating Query Results into Web Pages
» Problem Solution Discussion Creating a Navigation Index from Database Content
» Creating a Multiple-Page Navigation Index
» Problem Solution Discussion Storing Images or Other Binary Data
» Storing Images with LOAD_FILE Storing Images Using a Script
» Problem Solution Discussion Retrieving Images or Other Binary Data
» Problem Solution Discussion Serving Banner Ads
» Problem Solution Discussion Serving Query Results for Download
» Introduction Processing Web Input with MySQL
» Problem Solution Discussion Creating Forms in Scripts
» Problem Solution Discussion Creating Multiple-Pick Form Elements from Database Content
» Problem Solution Discussion Loading a Database Record into a Form
» Problem Solution Discussion Collecting Web Input
» Web Input Extraction Conventions Perl
» Problem Solution Discussion Validating Web Input
» Problem Solution Discussion Using Web Input to Construct Queries
» Problem Solution Discussion Processing File Uploads
» Perl Processing File Uploads
» Problem Solution Discussion Performing Searches and Presenting the Results
» Problem Solution Discussion Generating Previous-Page and Next-Page Links
» Paged Displays with Previous-Page and Next-Page Links
» Paged Displays with Links to Each Page
» Problem Solution Discussion Web Page Access Counting
» Problem Solution Discussion Web Page Access Logging
» Problem Solution Discussion Setting Up Database Logging
» Other Logging Issues Using MySQL for Apache Logging
» Session Management Issues Introduction
» Problem Solution Discussion Installing Apache::Session
» The Apache::Session Interface
» A Sample Application Using MySQL-Based Sessions in Perl Applications
» Problem Solution Discussion The PHP 4 Session Management Interface
» Specifying a User-Defined Storage Module
» Problem Solution Discussion Using MySQL for Session BackingStore with Tomcat
» The Servlet and JSP Session Interface A Sample JSP Session Application
Show more