Problem Solution Discussion Importing XML into MySQL

10.43 Importing XML into MySQL

10.43.1 Problem

You want t o im port an XML docum ent int o a MySQL t able.

10.43.2 Solution

Set up an XML parser t o read t he docum ent . Then use t he records in t he docum ent t o const ruct and execut e INSERT st at em ent s.

10.43.3 Discussion

I m port ing an XML docum ent depends on being able t o parse t he docum ent and ext ract record cont ent s from it . The way you do t his will depend on how t he docum ent is writ t en. For exam ple, one form at m ight represent colum n nam es and values as at t ribut es of column elem ent s: ?xml version=1.0 encoding=UTF-8? rowset row column name=subject value=Jane column name=test value=A column name=score value=47 row row column name=subject value=Jane column name=test value=B column name=score value=50 row ... rowset Anot her form at is t o use colum n nam es as elem ent nam es and colum n values as t he cont ent s of t hose elem ent s: ?xml version=1.0 encoding=UTF-8? rowset row subjectJanesubject testAtest score47score row row subjectJanesubject testBtest score50score row ... rowset Due t o t he various st ruct uring possibilit ies, it s necessary t o m ake som e assum pt ions about t he form at you expect t he XML docum ent t o have. For t he exam ple here, I ll assum e t he second form at j ust shown. One way t o process t his kind of docum ent is t o use t he XML: : XPat h m odule, which allows you t o refer t o elem ent s wit hin t he docum ent using pat h expressions. For exam ple, t he pat h row select s all t he row elem ent s under t he docum ent root , and t he pat h select s all children of a given elem ent . We can use t hese pat hs wit h XML: : XPat h t o obt ain first a list of all t he row elem ent s, and t hen for each row a list of all it s colum ns. The follow ing script , xm l_t o_m ysql.pl, t akes t hree argum ent s: xml_to_mysql.pl db_name tbl_name xml_file The filenam e argum ent indicat es which docum ent t o im port , and t he dat abase and t able nam e argum ent s indicat e which t able t o im port it int o. xm l_t o_m ysql.pl processes t he com m and- line argum ent s and connect s t o MySQL not shown , t hen processes t he docum ent : usrbinperl -w xml_to_mysql.pl - read XML file into MySQL use strict; use DBI; use XML::XPath; ... process command-line options not shown ... ... connect to database not shown ... Open file for reading my xp = XML::XPath-new filename = file_name; my row_list = xp-find row; find set of row elements print Number of records: . row_list-size . \n; foreach my row row_list-get_nodelist loop through rows { my name; array for column names my val; array for column values my col_list = row-find ; children columns of row foreach my col col_list-get_nodelist loop through columns { save column name and value push name, col-getName ; push val, col-string_value ; } construct INSERT statement, then execute it my stmt = INSERT INTO tbl_name . join ,, name . VALUES . join ,, ? x scalar val . ; dbh-do stmt, undef, val; } dbh-disconnect ; exit 0; The script creat es an XML::XPath obj ect , which opens and parses t he docum ent . Then t he obj ect is queried for t he set of row elem ent s, using t he pat h row . The size of t his set indicat es how m any records t he docum ent cont ains. To process each row, t he script uses t he pat h t o ask for all t he children of t he row obj ect . Each child corresponds t o a colum n wit hin t he row; using as t he pat h for get_nodelist t his way is convenient because we need not know in advance which colum ns t o expect . xm l_t o_m ysql.pl obt ains t he nam e and value from each colum n and saves t hem in t he name and value arrays. Aft er all t he colum ns have been processed, t he arrays are used t o const ruct an INSERT st at em ent t hat nam es t hose colum ns t hat were found t o be present in t he row and t hat includes a placeholder for each dat a value. Recipe 2.7 discusses placeholder list const ruct ion. Then t he script issues t he st at em ent , passing t he colum n values t o do t o bind t hem t o t he placeholders. I n t he previous sect ion, we used m ysql_t o_xm l.pl t o export t he cont ent s of t he expt t able as an XML docum ent . xm l_t o_m ysql.pl can be used t o perform t he converse operat ion of im port ing t he docum ent back int o MySQL: xml_to_mysql.pl cookbook expt expt.xml As it processes t he docum ent , t he script generat es and execut es t he following set of st at em ent s: INSERT INTO expt subject,test,score VALUES Jane,A,47 INSERT INTO expt subject,test,score VALUES Jane,B,50 INSERT INTO expt subject,test VALUES Jane,C INSERT INTO expt subject,test VALUES Jane,D INSERT INTO expt subject,test,score VALUES Marvin,A,52 INSERT INTO expt subject,test,score VALUES Marvin,B,45 INSERT INTO expt subject,test,score VALUES Marvin,C,53 INSERT INTO expt subject,test VALUES Marvin,D Not e t hat t hese st at em ent s do not all insert t he sam e num ber of colum ns. St at em ent s wit h m issing colum ns correspond t o rows wit h NULL values.

10.44 Epilog