PlyrMgr longtext, PostWins int11 default NULL,
PostLosses int11 default NULL ;
Of t he t hree program s, DBTools does t he best j ob of det erm ining t he st ruct ure of t he MySQL t able. I t uses t he index inform at ion present in t he Access file t o writ e t he
KEY
definit ion, and t o creat e st ring colum ns wit h t he proper lengt hs. MySQLFront doesnt produce t he key
definit ion and it defines st rings as
LONGTEXT
colum ns—even t he
PlyrMgr
colum n, w hich never cont ains a value longer t han one charact er. The qualit y of t he out put produced by
guess_t able.pl appears t o be som ewhere in bet ween. I t doesnt writ e t he key definit ion, but neit her does it writ e every st ring colum n as t he longest possible t ype. On t he ot her hand, t he
colum n lengt hs are som ewhat conservat ive. All in all, t hat s not bad, considering t hat guess_t able.pl doesnt have available t o it all t he inform at ion cont ained in t he original Access
file. And you can use it on a cross- plat form basis. These result s indicat e t hat if youre using Windows and your records are st ored in an Access
file, youre probably best off let t ing DBTools creat e your MySQL t ables for you. I n ot her sit uat ions such as when youre running under Unix or your dat afile com es from a source ot her
t han Access , guess_t able.pl can be beneficial.
10.38 A LOAD DATA Diagnostic Utility
10.38.1 Problem
LOAD DATA
or m ysqlim port indicat es a nonzero warning count when you load a dat afile int o MySQL, but you have no idea which rows or colum ns were problem at ic.
10.38.2 Solution
Run t he file t hrough a ut ilit y t hat diagnoses which dat a values caused t he warnings.
10.38.3 Discussion
As a bulk loader,
LOAD DATA
is very efficient ; it can run m any t im es fast er t han a set of
INSERT
st at em ent s t hat adds t he sam e rows. However,
LOAD DATA
also is not very inform at ive. I t ret urns only a m essage t hat indicat es t he num ber of records processed, and a
few ot her st at us count s. For exam ple, in t he previous sect ion, we generat ed a dat afile m anagers.t xt t o use w it h guess_t able.pl for guessing t he st ruct ure of t he baseball1.com
managers
t able. I f you creat e t hat t able using t he result ing
CREATE TABLE
st at em ent and t hen load t he dat afile int o it , you will observe t he following result :
mysql LOAD DATA LOCAL INFILE managers.txt INTO TABLE managers - IGNORE 1 LINES;
Query OK, 2841 rows affected 0.06 sec Records: 2841 Deleted: 0 Skipped: 0 Warnings: 5082
Evident ly, t here were a quit e a few problem s wit h t he file. Unfort unat ely, t he m essage produced by
LOAD DATA
doesnt t ell you anyt hing about which rows and colum ns caused t hem . The m ysqlim port program is sim ilarly t erse, because it s m essage is t he sam e as t he one
ret urned by
LOAD DATA
. Well revisit t his exam ple at t he end of t he sect ion, but first consider
LOAD DATA
s out put st yle. On t he one hand, t he m inim al- report approach is t he right one t o t ake. I f warning
inform at ion were t o be ret urned t o t he client , it pot ent ially could include a diagnost ic m essage for each input row, or even for each colum n This m ight be overwhelm ing and cert ainly would
ent irely defeat t he high-efficiency nat ure of
LOAD DATA
. On t he ot her hand, m ore inform at ion about t he source of errors could be useful for fixing t he file t o elim inat e t he warnings.
I t s on t he MySQL developm ent t o do list t o allow
LOAD DATA
errors t o be logged t o anot her t able so t hat you can get ext ended diagnost ic inform at ion. I n t he m eant im e, you can use t he
load_diag.pl ut ilit y included in t he t ransfer direct ory of t he
recipes
dist ribut ion. load_diag.pl is useful for pre- flight ing a dat afile t o get an idea of how well t he file will load int o t he t able
you int end it for, and t o pinpoint problem s so t hat you can clean up t he file before loading it int o MySQL for real.
load_diag.pl also can help you ident ify pat t erns of problem s for sit uat ions in which it m ay be beneficial t o writ e a preprocessing filt er. Suppose you periodically receive files cont aining dat a
t o be loaded int o a given MySQL t able. The m ore frequent ly t his occurs, t he m ore highly m ot ivat ed youll be t o aut om at e as m uch of t he dat a t ransfer process as possible. This m ay
involve writ ing a filt er t o convert dat a values from t he form at in which you receive t hem t o a form at m ore appropriat e for MySQL. Running t he dat afiles t hrough load_diag.pl can help you
assess which colum ns t end t o be problem at ic and t hereby assist you in det erm ining where t o concent rat e your effort s in creat ing a t ransform at ion program for rewrit ing t he files so t hey
will load cleanly int o MySQL. To run load_diag.pl, specify t he nam e of t he dat abase and t able you int end t o load t he dat afile
int o, as well as t he nam e of t he file it self:
load_diag.pl db_name
tbl_name file_name
load_diag.pl wont act ually load anyt hing int o t he t able nam ed on t he com m and line, but it needs t o know what t he t able is so t hat it can creat e a t em porary t able t hat has t he sam e
colum n st ruct ure t o use for t est ing. I nit ially, load_diag.pl loads t he ent ire dat afile int o t he t em porary t able t o see if t here are any
warnings. I f not , t heres not hing else t o do, so load_diag.pl drops t he t em porary t able and exit s. Ot herwise, it loads each line of t he dat afile int o t he t able individually t o det erm ine which
lines caused problem s, using t he following procedure:
•
I t writ es t he line t o a t em porary file and issues a
LOAD DATA
st at em ent t o load t he file int o t he t able. I f t he warning count is zero, t he line is assum ed t o be okay.
•
I f t he warning count for t he line is nonzero, load_diag.pl exam ines each of it s colum ns in t urn by using a series of single-colum n
LOAD DATA
st at em ent s t o find out which ones generat e warnings.
•
I f a colum n- specific warning occurs and t he dat a value is em pt y, load_diag.pl det erm ines whet her t he warning goes away by loading a
NULL
value inst ead. I t does t his because if a dat afile cont ains em pt y values, you can oft en get bet t er result s by
loading
NULL
t han by loading em pt y st rings. For exam ple, if you load an em pt y st ring int o an
INT
colum n, MySQL convert s t he value t o and issues a warning. I f a dat afile
t urns out t o have a significant reduct ion in warnings when loading
NULL
rat her t han em pt y st rings, you m ay find it useful t o run t he file t hrough t o_null.pl before loading
it .
•
I t s also possible for warnings t o occur if a line cont ains fewer or m ore colum ns t han t he num ber of colum ns in t he t able, so load_diag.pl checks t hat , t oo.
load_diag.pl print s diagnost ic inform at ion about it s findings while t est ing each input line, t hen print s a sum m ary report aft er t he ent ire file has been processed. The report indicat es t he
num ber of lines in t he file, how m any warnings t he init ial full- file load caused, and t he num ber of lines t hat had t oo few or t oo m any colum ns. The report also includes a list t hat shows for
each colum n how m any values were m issing, t he num ber of warnings t hat occurred, how m any of t hose warnings occurred for em pt y values, and t he num ber of em pt y- value warnings
t hat went away by loading
NULL
inst ead. As you m ight guess, all t his act ivit y m eans t hat load_diag.pl isnt nearly as efficient as
LOAD DATA
. I n fact , it has t he pot ent ial t o exercise your server rat her heavily But it s goal is t o provide m axim al inform at ion, not m inim al execut ion t im e. Not e t oo t hat if your MySQL server
has logging enabled, using load_diag.pl wit h large dat afiles can cause t he logs t o grow quickly.
To see how load_diag.pl works, assum e you have a sim ple t able nam ed
diag_test
t hat cont ains st ring, dat e, and num ber colum ns:
CREATE TABLE diag_test str CHAR10,
date DATE, num INT
;
Assum e you also have a dat afile nam ed diag_sam ple.dat t hat you plan t o load int o t he t able: str1 01-20-2001 97
str2 02-28-2002 03-01-2002 64 extra junk
To see if t he file will have any problem s loading, check it like t his:
load_diag.pl cookbook diag_test diag_sample.dat
line 1: 1 warning column 2 date: bad value = 01-20-2001
line 2: 2 warnings too few columns
column 2 date: bad value = 02-28-2002 column 3 num: missing from input line
column 3 num: bad value = inserting NULL worked better line 3: 1 warning
excess number of columns Number of lines in file: 3
Warnings found when loading entire file: 4 Lines containing too few column values: 1
Lines containing excess column values: 1 Warnings per column:
Column Times Total Warnings for Improved missing warnings empty columns with NULL
str 0 0 0 0 date 0 2 0 0
num 1 1 1 1
I t appears t hat t he dat es dont load very well. That s not surprising, because t hey appear t o be in U.S. form at and should be rewrit t en in I SO form at . Convert ing em pt y fields t o
\N
m ay also be beneficial, and you can get rid of t he ext ra colum n value in line 3. Using som e of t he
ut ilit ies developed earlier in t his chapt er, perform all t hose t ransform at ions, writ ing t he result t o a t em porary file:
yank_col.pl --columns=1-3 diag_sample.dat \ | cvt_date.pl --iformat=us --oformat=iso \
| to_null.pl tmp
The t m p file produced by t hat com m and looks like t his: str1 2001-01-20 97
str2 2002-02-28 \N \N 2002-03-01 64
Using load_diag.pl t o check t he new file produces t he following result :
load_diag.pl cookbook diag_test tmp File loaded with no warnings, no per-record tests performed
This indicat es t hat if you load t m p int o t he
diag_test
t able, you should get good result s, and indeed t hat is t rue:
mysql LOAD DATA LOCAL INFILE tmp INTO TABLE diag_test; Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Clearly, t hat s a lot of m essing around j ust t o m ake a t hree- line file load int o MySQL bet t er. But t he point of t he exam ple is t o illust rat e t hat t he feedback load_diag.pl provides can help
you figure out what s wrong wit h a dat afile so t hat you can clean it up.
I n addit ion t o t he required argum ent s t hat nam e t he dat abase, t able, and dat afile, load_diag.pl underst ands several opt ions:
- - colum ns
= name1
,
name2
,
name3
,...
By default , load_diag.pl assum es t he dat afile cont ains colum ns t hat correspond in num ber and order t o t he colum ns in t he t able. I f t hat is not t r ue, use t his opt ion t o specify t he nam es of t he colum ns t hat ar e pr esent in t he file, and in w hat
or der .
- - labels
This opt ion indicat es t hat t he dat afile cont ains an init ial row of labels t hat should be skipped. Loading labels int o a t able t ypically r esult s in spur ious w ar nings.
- - skip- full- load
Skip t he init ial t est t hat loads t he ent ir e dat afile.
- - t m p-t able
= tbl_name
Specify t he nam e t o use for t he t em por ar y t able. The default is _load_diag_
n , w her e
n is load_diag.pls pr ocess I D.
I f necessary, you can also specify st andard connect ion param et er opt ions like - - user or - - host . Any opt ions m ust precede t he dat abase nam e argum ent .
Use of load_diag.pl is subj ect t o t he following const raint s and lim it at ions:
•
The input m ust be in t ab-delim it ed, linefeed- t erm inat ed form at .
•
Record loading is perform ed wit h t he
LOCAL
opt ion of t he
LOAD DATA
st at em ent .
LOCAL
capabilit y requires MySQL 3.22.15 or higher and, as of 3.23.49, requires t hat your MySQL dist ribut ion not have been built wit h t hat capabilit y disabled .
•
When load_diag.pl creat es t he t em porary t able, it om it s any indexes t hat are present in t he original t able. This result s in fast er record loading t im e part icularly for t he
init ial t est t hat loads t he ent ire dat afile . On t he ot her hand, not using indexes m eans t hat load_diag.pl wont find warnings t hat result from duplicat e key values on unique
indexes. Ret urning t o t he exam ple wit h which t his sect ion began, what about all t hose warnings t hat
result ed from loading t he m anagers.t xt file int o t he
managers
t able? load_diag.pl ident ifies t hem all as being due t o m issing or em pt y colum ns at t he end of som e of t he lines:
load_diag.pl --labels cookbook managers managers.txt line 2: 2 warnings
column 14 postwins: bad value = inserting NULL worked better column 15 postlosses: bad value = inserting NULL worked better
line 3: 2 warnings column 14 postwins: bad value = inserting NULL worked better
column 15 postlosses: bad value = inserting NULL worked better
... line 2839: 2 warnings
column 14 postwins: bad value = inserting NULL worked better column 15 postlosses: bad value = inserting NULL worked better
line 2842: 2 warnings column 14 postwins: bad value = inserting NULL worked better
column 15 postlosses: bad value = inserting NULL worked better Number of lines in file: 2842
Warnings found when loading entire file: 5082 Lines containing too few column values: 416
Lines containing excess column values: 0 Warnings per column:
Column Times Total Warnings for Improved missing warnings empty columns with NULL
lahmanid 0 0 0 0 year 0 0 0 0
team 0 0 0 0 lg 0 0 0 0
div 0 0 0 0 g 0 0 0 0
w 0 0 0 0 l 0 0 0 0
pct 0 0 0 0 std 0 0 0 0
half 0 0 0 0 mgrorder 0 0 0 0
plyrmgr 16 0 0 0 postwins 416 2533 2533 2533
postlosses 416 2533 2533 2533
From t his result , we can det erm ine t hat 416 lines were m issing t he
postwins
and
postlosses
colum ns and 16 of t hose were m issing t he
plyrmgr
colum n as well . The rem aining errors were due t o lines for which t he
postwins
and
postlosses
colum ns were present but em pt y. The ent ire- file warning count of 5082 can be account ed for as t he t he
num ber of
plyrmgr
values t hat were m issing, plus t he t ot al warnings from t he
postwins
and
postlosses
colum ns 16+ 2533+ 2533 = 5082 . The
Total warnings
value for t he
plyrmgr
colum n is zero because it s a
CHAR
colum n, and t hus loading em pt y values int o it is legal. The
Total warnings
value for
postwins
and
postlosses
is nonzero because t hey are
INT
colum ns and loading em pt y values int o t hem result in a conversion-t o- zero operat ions. All of t hese problem s are of t he sort t hat can be
m ade t o go away by convert ing em pt y or m issing values t o
\N
. Run t he file t hrough yank_col.pl t o force each line t o have 15 colum ns, and run t he result t hrough t o_null.pl t o
convert em pt y values t o
\N
:
yank_col.pl --columns=1-15 managers.txt | to_null.pl tmp
Then see what load_diag.pl has t o say about t he result ing file:
load_diag.pl --labels cookbook managers tmp
File loaded with no warnings, no per-record tests performed I f you load t m p int o t he
managers
t able, no problem s should occur:
mysql LOAD DATA LOCAL INFILE tmp INTO TABLE managers IGNORE 1 LINES; Query OK, 2841 rows affected 0.13 sec
Records: 2841 Deleted: 0 Skipped: 0 Warnings: 0
10.39 Exchanging Data Between MySQL and Microsoft Access