Problem Solution Discussion Dont Assume LOAD DATA Knows More than It Does

Records: 134 Deleted: 0 Skipped: 2 Warnings: 13 These values provide som e general inform at ion about t he im port operat ion: • Records indicat es t he num ber of records found in t he file. • Deleted and Skipped are relat ed t o t reat m ent of input records t hat duplicat e exist ing t able records on unique index values. Deleted indicat es how m any records were delet ed from t he t able and replaced by input records, and Skipped indicat es how m any input records were ignored in favor of exist ing records. • Warnings is som et hing of a cat ch- all t hat indicat es t he num ber of problem s found while loading dat a values int o colum ns. Eit her a value st ores int o a colum n properly, or it doesnt . I n t he lat t er case, t he value ends up in MySQL as som et hing different and MySQL count s it as a warning. St oring a st ring abc int o a num eric colum n result s in a st ored value of , for exam ple. What do t hese values t ell you? The Records value norm ally should m at ch t he num ber of lines in t he input file. I f it is different t han t he files line count , t hat s a sign t hat MySQL is int erpret ing t he file as having a form at t hat differs from t he form at it act ually has. I n t his case, youre likely also t o see a high Warnings value, which indicat es t hat m any values had t o be convert ed because t hey didnt m at ch t he expect ed dat a t ype. The solut ion t o t his problem oft en is t o specify t he pr oper FIELDS and LINES clauses. Ot herwise, t he values m ay not t ell you a lot . You cant t ell from t hese num bers which input records had problem s or which colum ns were bad. There is som e work being done for MySQL 4 t o m ake addit ional warning inform at ion available. I n t he m eant im e, see Recipe 10.38 for a script t hat exam ines your dat afile and at t em pt s t o pinpoint t roublesom e dat a values.

10.10 Dont Assume LOAD DATA Knows More than It Does

10.10.1 Problem

You t hink LOAD DATA is sm art er t han it really is.

10.10.2 Solution

Dont assum e t hat LOAD DATA knows anyt hing at all about t he form at of your dat afile. And m ake sure you yourself know what it s form at is. I f t he file has been t ransferred from one m achine t o anot her, it s cont ent s m ay have been changed in subt le ways of which youre not aw are.

10.10.3 Discussion

Many LOAD DATA frust rat ions occur because people expect MySQL t o know t hings t hat it cannot possibly know. LOAD DATA m akes cert ain assum pt ions about t he st ruct ure of input files, represent ed as t he default set t ings for t he line and field t erm inat ors, and for t he quot e and escape charact er set t ings. I f your input doesnt m at ch t hose assum pt ions, you need t o t ell MySQL about it . When in doubt , check t he cont ent s of your dat afile using a hex dum p program or ot her ut ilit y t hat displays a visible represent at ion of whit espace charact ers like t ab, carriage ret urn, and linefeed. Under Unix, t he od program can display file cont ent s in a variet y of form at s. I f you dont have od or som e com parable ut ilit y, t he t ransfer direct ory of t he recipes dist ribut ion cont ains hex dum pers writ t en in Perl and Pyt hon hexdum p.pl and hexdum p.py , as well as a couple of program s t hat display print able represent at ions of all charact ers of a file see.pl and see.py . You m ay find t hem useful for exam ining files t o see what t hey really cont ain. I n som e cases, you m ay be surprised t o discover t hat a files cont ent s are different t han you t hink. This is in fact quit e likely if t he file has been t ransferred from one m achine t o anot her: • An FTP t ransfer bet ween m achines running different operat ing syst em s t ypically t ranslat es line endings t o t hose t hat are appropriat e for t he dest inat ion m achine if t he t ransfer is perform ed in t ext m ode rat her t han in binary im age m ode. Suppose you have t ab-delim it ed linefeed- t erm inat ed records in a dat afile t hat load int o MySQL on a Unix syst em j ust fine using t he default LOAD DATA set t ings. I f you copy t he file t o a Windows m achine wit h FTP using a t ext t ransfer m ode, t he linefeeds probably will be convert ed t o carriage ret urn linefeed pairs. On t hat m achine, t he file will not load properly wit h t he sam e LOAD DATA st at em ent , because it s cont ent s will have been changed. Does MySQL have any way of knowing t hat ? No. So it s up t o you t o t ell it , by adding a LINES TERMINATED BY \r\n clause t o t he st at em ent . Transfers bet ween any t wo syst em s wit h dissim ilar default line endings can cause t hese changes. For exam ple, a Macint osh file cont aining carriage ret urns m ay cont ain linefeeds aft er t ransfer t o a Unix syst em . You should eit her account for such changes wit h a LINES TERMINATED BY clause t hat reflect s t he m odified line- ending sequence, or t ransfer t he file in binary m ode so t hat it s cont ent s do not change. • Dat afiles past ed int o em ail m essages oft en do not survive int act . Mail soft ware m ay wrap break long lines or convert line-ending sequences. I f you m ust t ransfer a dat afile by em ail, it s best sent as an at t achm ent .

10.11 Skipping Datafile Lines