Performing a Multiple-Table Delete by Writing a Program

dist_id INT UNSIGNED NOT NULL AUTO_INCREMENT, distribution ID name VARCHAR40, distribution name ver_num NUMERIC5,2, version number rel_date DATE NOT NULL, release date PRIMARY KEY dist_id ; CREATE TABLE tmp_item dist_id INT UNSIGNED NOT NULL, parent distribution ID dist_file VARCHAR255 NOT NULL name of file in distribution ; Then det erm ine t he I Ds of t he dist ribut ions you want t o keep t hat is, t he m ost recent version of each dist ribut ion . The I Ds are found as follows, using queries sim ilar t o t hose j ust described in t he m ult iple- t able delet e sect ion: mysql CREATE TABLE tmp - SELECT name, MAXver_num AS newest - FROM swdist_head - GROUP BY name; mysql CREATE TABLE tmp2 - SELECT swdist_head.dist_id - FROM swdist_head, tmp - WHERE swdist_head.name = tmp.name AND swdist_head.ver_num = tmp.newest; Next , select int o t he new t ables t he records t hat should be ret ained: mysql INSERT INTO tmp_head - SELECT swdist_head. - FROM swdist_head, tmp2 - WHERE swdist_head.dist_id = tmp2.dist_id; mysql INSERT INTO tmp_item - SELECT swdist_item. - FROM swdist_item, tmp2 - WHERE swdist_item.dist_id = tmp2.dist_id; Finally, replace t he original t ables wit h t he new ones: mysql DROP TABLE swdist_head; mysql ALTER TABLE tmp_head RENAME TO swdist_head; mysql DROP TABLE swdist_item; mysql ALTER TABLE tmp_item RENAME TO swdist_item;

12.21.6 Performing a Multiple-Table Delete by Writing a Program

The preceding t wo m et hods for delet ing relat ed rows from m ult iple t ables are SQL- only t echniques. Anot her approach is t o writ e a program t hat generat es t he DELETE st at em ent s for you. The program should det erm ine t he key values t he dist ribut ion I Ds for t he records t o delet e, t hen process t he keys t o t urn t hem int o appropriat e DELETE st at em ent s. I dent ifying t he I Ds can be done t he sam e way as shown for t he previous m et hods, but you have som e lat it ude in how you want t o use t hem t o delet e records: • Handle each I D individually. Const ruct DELETE st at em ent s t hat rem ove records from t he t ables one I D at a t im e. • Handle t he I Ds as a group. Const ruct an IN clause t hat nam es all t he I Ds, and use it wit h each t able t o delet e all t he m at ching I Ds at once. • I f t he I D list is huge, break it int o sm aller groups t o const ruct short er IN clauses. • You can also solve t he problem by reversing t he perspect ive. Select t he I Ds for t he dist ribut ions you want t o ret ain and use t hem t o const ruct a NOT IN clause t hat delet es all t he ot her dist ribut ions. This will usually be less efficient , because MySQL will not use an index for NOT IN oper at ions. I ll show how t o im plem ent each m et hod using Perl. For each of t he first t hree m et hods, begin by generat ing a list of t he dist ribut ion I Ds for t he records t o be delet ed: Identify the newest version for each distribution name dbh-do CREATE TABLE tmp SELECT name, MAXver_num AS newest FROM swdist_head GROUP BY name; Identify the IDs for versions that are older than those. my ref = dbh-selectcol_arrayref SELECT swdist_head.dist_id FROM swdist_head, tmp WHERE swdist_head.name = tmp.name AND swdist_head.ver_num tmp.newest; selectcol_arrayref returns a reference to a list. Convert the reference to a list, which will be empty if ref is undef or points to an empty list. my val = ref ? {ref} : ; At t his point , val cont ains t he list of I Ds for t he records t o rem ove. To process t hem individually, run t he following loop: Use the ID list to delete records, one ID at a time foreach my val val { dbh-do DELETE FROM swdist_head WHERE dist_id = ?, undef, val; dbh-do DELETE FROM swdist_item WHERE dist_id = ?, undef, val; } The loop will generat e st at em ent s t hat look like t his: DELETE FROM swdist_head WHERE dist_id = 1 DELETE FROM swdist_item WHERE dist_id = 1 DELETE FROM swdist_head WHERE dist_id = 3 DELETE FROM swdist_item WHERE dist_id = 3 DELETE FROM swdist_head WHERE dist_id = 2 DELETE FROM swdist_item WHERE dist_id = 2 A drawback of t his approach is t hat for large t ables, t he I D list m ay be quit e large and youll generat e lot s of DELETE st at em ent s. To be m ore efficient , com bine t he I Ds int o a single IN clause t hat nam es t hem all at once. Generat e t he I D list t he sam e way as for t he first m et hod, t hen process t he list like t his: [4] [4] In Perl, you cant bind an array to a placeholder, but you can construct the query string to contain the proper number of ? characters see Recipe 2.7 . Then pass the array to be bound to the statement, and each element will be bound to the corresponding placeholder. Use the ID list to delete records for all IDs at once. If the list is empty, dont bother; theres nothing to delete. if val { generate list of comma-separated ? placeholders, one per value my where = WHERE dist_id IN . join ,, ? x val . ; dbh-do DELETE FROM swdist_head where, undef, val; dbh-do DELETE FROM swdist_item where, undef, val; } This m et hod generat es only one DELETE st at em ent per t able: DELETE FROM swdist_head WHERE dist_id IN 1,3,2 DELETE FROM swdist_item WHERE dist_id IN 1,3,2 I f t he list of I Ds is ext rem ely large, you m ay be in danger of producing DELETE st at em ent s t hat exceed t he m axim um query lengt h a m egabyt e by default . I n t his case, you can break t he I D list int o sm aller groups and use each one t o const ruct a short er IN clause: Use the ID list to delete records, using parts of the list at a time. my grp_size = 1000; number of IDs to delete at once for my i = 0; i val; i += grp_size { my j = val i + grp_size ? val : i + grp_size; my group = val[i .. j-1]; generate list of comma-separated ? placeholders, one per value my where = WHERE dist_id IN . join ,, ? x group . ; dbh-do DELETE FROM swdist_head where, undef, group; dbh-do DELETE FROM swdist_item where, undef, group; } Each of t he preceding program m ing m et hods finds t he I Ds of t he records t o rem ove and t hen delet es t hem . You can also achieve t he sam e obj ect ive using reverse logic: select t he I Ds for t he records you want t o keep, t hen delet e everyt hing else. This approach can be useful if you expect t o ret ain fewer records t han youll delet e. To im plem ent it , det erm ine t he newest version for each dist ribut ion and find t he associat ed I Ds. Then use t he I D list t o const ruct a NOT IN clause: Identify the newest version for each distribution name dbh-do CREATE TABLE tmp SELECT name, MAXver_num AS newest FROM swdist_head GROUP BY name; Identify the IDs for those versions. my ref = dbh-selectcol_arrayref SELECT swdist_head.dist_id FROM swdist_head, tmp WHERE swdist_head.name = tmp.name AND swdist_head.ver_num = tmp.newest; selectcol_arrayref returns a reference to a list. Convert the reference to a list, which will be empty if ref is undef or points to an empty list. my val = ref ? {ref} : ; Use the ID list to delete records for all other IDs at once. The WHERE clause is empty if the list is empty in that case, no records are to be kept, so they all can be deleted. my where = ; if val { generate list of comma-separated ? placeholders, one per value where = WHERE dist_id NOT IN . join ,, ? x val . ; } dbh-do DELETE FROM swdist_head where, undef, val; dbh-do DELETE FROM swdist_item where, undef, val; Not e t hat wit h t his reverse- logic approach, you m ust use t he ent ire I D list in a single NOT IN clause. I f you t ry breaking t he list int o sm aller groups and using NOT IN w it h each of t hose, youll em pt y your t ables com plet ely when you dont int end t o.

12.21.7 Performing a Multiple-Table Delete Using mysql