Problem Solution Discussion Dealing with Duplicates at Record-Creation Time

I f a UNIQUE index does happen t o allow NULL values, NULL is special because it is t he one value t hat can occur m ult iple t im es. The rat ionale for t his is t hat it is not possible t o know whet her one unknown value is t he sam e as anot her, so m ult iple unknown values are allowed. I t m ay of course be t hat youd want t he person t able t o reflect t he real world, in which people do som et im es have t he sam e nam e. I n t his case, you cannot set up a unique index based on t he nam e colum ns, because duplicat e nam es m ust be allowed. I nst ead, each person m ust be assigned som e sort of unique ident ifier, which becom es t he value t hat dist inguishes one record from anot her. I n MySQL, a com m on t echnique for t his is t he AUTO_INCREMENT colum n: CREATE TABLE person id INT UNSIGNED NOT NULL AUTO_INCREMENT, last_name CHAR20, first_name CHAR20, address CHAR40, PRIMARY KEY id ; I n t his case, when you creat e a record wit h an id value of NULL , MySQL assigns t hat colum n a unique I D aut om at ically. Anot her possibilit y is t o assign ident ifiers ext ernally and use t hose I Ds as unique keys. For exam ple, cit izens in a given count ry m ight have unique t axpayer I D num bers. I f so, t hose num bers can serve as t he basis for a unique index: CREATE TABLE person tax_id INT UNSIGNED NOT NULL, last_name CHAR20, first_name CHAR20, address CHAR40, PRIMARY KEY tax_id ;

14.2.4 See Also

AUTO_INCREMENT colum ns are discussed furt her in Chapt er 11 .

14.3 Dealing with Duplicates at Record-Creation Time

14.3.1 Problem

Youve creat ed a t able wit h a unique index t o prevent duplicat e values in t he indexed colum n or colum ns. But t his result s in an error if you at t em pt t o insert a duplicat e record, and you want t o avoid having t o deal wit h such errors.

14.3.2 Solution

One approach is t o j ust ignore t he error. Anot her is t o use eit her an INSERT IGNORE or REPLACE st at em ent , each of which m odifies MySQLs duplicat e- handling behavior. For bulk- loading operat ions, LOAD DATA has m odifiers t hat allow you t o specify how t o handle duplicat es.

14.3.3 Discussion

By default , MySQL generat es an error when you insert a record t hat duplicat es an exist ing unique key. For exam ple, youll see t he following result if t he person t able cont ains a unique index on t he last_name and first_name colum ns: mysql INSERT INTO person last_name, first_name - VALUESX1,Y1; Query OK, 1 row affected 0.00 sec mysql INSERT INTO person last_name, first_name - VALUESX1,Y1; ERROR 1062 at line 1: Duplicate entry X1-Y1 for key 1 I f youre issuing t he st at em ent s from t he m ysql program int eract ively, you can sim ply say, Okay, t hat didnt work, ignore t he error, and cont inue. But if you writ e a program t o insert t he records, an error m ay t erm inat e t he program . One way t o avoid t his is t o m odify t he program s error- handling behavior t o t rap t he error and t hen ignore it . See Recipe 2.3 for inform at ion about error- handling t echniques. I f you want t o prevent t he error from occurring in t he first place, you m ight consider using a t w o- query m et hod t o solve t he duplicat e- record problem : issue a SELECT t o see if t he record is already present , followed by an INSERT if it s not . But t hat doesnt really work. Anot her client m ight insert t he sam e record aft er t he SELECT and before t he INSERT , in which case t he error would st ill occur. To m ake sure t hat doesnt happen, you could use a t ransact ion or lock t he t ables, but t hen youre up from t wo st at em ent s t o four. MySQL provides t wo single-query solut ions t o t he problem of handling duplicat e records: • Use INSERT IGNORE rat her t han INSERT . I f a record doesnt duplicat e an exist ing record, MySQL insert s it as usual. I f t he record is a duplicat e, t he IGNORE keyword t ells MySQL t o discard it silent ly wit hout generat ing an error: • mysql INSERT IGNORE INTO person last_name, first_name • - VALUESX2,Y2; • Query OK, 1 row affected 0.00 sec • mysql INSERT IGNORE INTO person last_name, first_name • - VALUESX2,Y2; Query OK, 0 rows affected 0.00 sec The row count value indicat es whet her t he record was insert ed or ignored. From wit hin a program , you can obt ain t his value by checking t he rows- affect ed funct ion provided by your API . See Recipe 2.5 and Recipe 9.2 . • Use REPLACE rat her t han INSERT . I f t he record is new , it s insert ed j ust as w it h INSERT . I f it s a duplicat e, t he new record replaces t he old one: • mysql REPLACE INTO person last_name, first_name • - VALUESX3,Y3; • Query OK, 1 row affected 0.00 sec • mysql REPLACE INTO person last_name, first_name • - VALUESX3,Y3; Query OK, 2 rows affected 0.00 sec The rows- affect ed value in t he second case is 2 because t he original record is delet ed and t he new record is insert ed in it s place. INSERT IGNORE and REPLACE should be chosen according t o t he duplicat e- handling behavior you want t o effect . INSERT IGNORE keeps t he first of a set of duplicat ed records and discards t he rest . REPLACE keeps t he last of a set of duplicat es and kicks out any earlier ones. INSERT IGNORE is m ore efficient t han REPLACE because it doesnt act ually insert duplicat es. Thus, it s m ost applicable when you j ust want t o m ake sure a copy of a given record is present in a t able. REPLACE , on t he ot her hand, is oft en m ore appropriat e for t ables in which ot her non- key colum ns m ay need updat ing. Suppose youre m aint aining a t able nam ed passtbl for a w eb applicat ion t hat cont ains em ail addresses and passwords and t hat is keyed by em ail address: CREATE TABLE passtbl email CHAR60 NOT NULL, password CHAR20 BINARY NOT NULL, PRIMARY KEY email ; How do you creat e records for new users, and change passwords for exist ing users? Wit hout REPLACE , creat ing a new user and changing an exist ing users password m ust be handled different ly. A t ypical algorit hm for handling record m aint enance m ight look like t his: • I ssue a SELECT t o see if a record already exist s w it h a given email value. • I f no such record exist s, add a new one wit h INSERT . • I f t he record does exist , updat e it wit h UPDATE . All of t hat m ust be perform ed wit hin a t ransact ion or wit h t he t ables locked t o prevent ot her users from changing t he t ables while youre using t hem . Wit h REPLACE , you can sim plify bot h cases t o t he sam e single-st at em ent operat ion: REPLACE INTO passtbl email,password VALUES address , passval ; I f no record wit h t he given em ail address exist s, MySQL creat es a new one. I f a record does exist , MySQL replaces it ; in effect , t his updat es t he password colum n of t he record associat ed wit h t he address. INSERT IGNORE and REPLACE have t he benefit of elim inat ing overhead t hat m ight ot herwise be required for a t ransact ion. But t his benefit com es at t he price of port abilit y, because bot h are MySQL- specific st at em ent s. I f port abilit y is a high priorit y, you m ight prefer t o st ick wit h a t ransact ional approach. For bulk- load operat ions in which you use t he LOAD DATA st at em ent t o load a set of records from a file int o a t able, duplicat e- record handling can be cont rolled using t he st at em ent s IGNORE and REPLACE m odifiers. These produce behavior analogous t o t hat of t he INSERT IGNORE and REPLACE st at em ent s. See Recipe 10.8 for m ore inform at ion.

14.4 Counting and Identifying Duplicates