I f a
UNIQUE
index does happen t o allow
NULL
values,
NULL
is special because it is t he one value t hat can occur m ult iple t im es. The rat ionale for t his is t hat it is not possible t o know
whet her one unknown value is t he sam e as anot her, so m ult iple unknown values are allowed. I t m ay of course be t hat youd want t he
person
t able t o reflect t he real world, in which people do som et im es have t he sam e nam e. I n t his case, you cannot set up a unique index
based on t he nam e colum ns, because duplicat e nam es m ust be allowed. I nst ead, each person m ust be assigned som e sort of unique ident ifier, which becom es t he value t hat dist inguishes
one record from anot her. I n MySQL, a com m on t echnique for t his is t he
AUTO_INCREMENT
colum n: CREATE TABLE person
id INT UNSIGNED NOT NULL AUTO_INCREMENT, last_name CHAR20,
first_name CHAR20, address CHAR40,
PRIMARY KEY id ;
I n t his case, when you creat e a record wit h an
id
value of
NULL
, MySQL assigns t hat colum n a unique I D aut om at ically. Anot her possibilit y is t o assign ident ifiers ext ernally and use
t hose I Ds as unique keys. For exam ple, cit izens in a given count ry m ight have unique t axpayer I D num bers. I f so, t hose num bers can serve as t he basis for a unique index:
CREATE TABLE person tax_id INT UNSIGNED NOT NULL,
last_name CHAR20, first_name CHAR20,
address CHAR40, PRIMARY KEY tax_id
;
14.2.4 See Also
AUTO_INCREMENT
colum ns are discussed furt her in Chapt er 11
.
14.3 Dealing with Duplicates at Record-Creation Time
14.3.1 Problem
Youve creat ed a t able wit h a unique index t o prevent duplicat e values in t he indexed colum n or colum ns. But t his result s in an error if you at t em pt t o insert a duplicat e record, and you
want t o avoid having t o deal wit h such errors.
14.3.2 Solution
One approach is t o j ust ignore t he error. Anot her is t o use eit her an
INSERT IGNORE
or
REPLACE
st at em ent , each of which m odifies MySQLs duplicat e- handling behavior. For bulk- loading operat ions,
LOAD DATA
has m odifiers t hat allow you t o specify how t o handle duplicat es.
14.3.3 Discussion
By default , MySQL generat es an error when you insert a record t hat duplicat es an exist ing unique key. For exam ple, youll see t he following result if t he
person
t able cont ains a unique index on t he
last_name
and
first_name
colum ns:
mysql INSERT INTO person last_name, first_name - VALUESX1,Y1;
Query OK, 1 row affected 0.00 sec mysql INSERT INTO person last_name, first_name
- VALUESX1,Y1; ERROR 1062 at line 1: Duplicate entry X1-Y1 for key 1
I f youre issuing t he st at em ent s from t he m ysql program int eract ively, you can sim ply say, Okay, t hat didnt work, ignore t he error, and cont inue. But if you writ e a program t o insert
t he records, an error m ay t erm inat e t he program . One way t o avoid t his is t o m odify t he program s error- handling behavior t o t rap t he error and t hen ignore it . See
Recipe 2.3 for
inform at ion about error- handling t echniques. I f you want t o prevent t he error from occurring in t he first place, you m ight consider using a
t w o- query m et hod t o solve t he duplicat e- record problem : issue a
SELECT
t o see if t he record is already present , followed by an
INSERT
if it s not . But t hat doesnt really work. Anot her client m ight insert t he sam e record aft er t he
SELECT
and before t he
INSERT
, in which case t he error would st ill occur. To m ake sure t hat doesnt happen, you could use a
t ransact ion or lock t he t ables, but t hen youre up from t wo st at em ent s t o four. MySQL provides t wo single-query solut ions t o t he problem of handling duplicat e records:
•
Use
INSERT IGNORE
rat her t han
INSERT
. I f a record doesnt duplicat e an exist ing record, MySQL insert s it as usual. I f t he record is a duplicat e, t he
IGNORE
keyword t ells MySQL t o discard it silent ly wit hout generat ing an error:
•
mysql INSERT IGNORE INTO person last_name, first_name
•
- VALUESX2,Y2;
•
Query OK, 1 row affected 0.00 sec
•
mysql INSERT IGNORE INTO person last_name, first_name
•
- VALUESX2,Y2; Query OK, 0 rows affected 0.00 sec
The row count value indicat es whet her t he record was insert ed or ignored. From wit hin a program , you can obt ain t his value by checking t he rows- affect ed funct ion provided
by your API . See Recipe 2.5
and Recipe 9.2
.
•
Use
REPLACE
rat her t han
INSERT
. I f t he record is new , it s insert ed j ust as w it h
INSERT
. I f it s a duplicat e, t he new record replaces t he old one:
•
mysql REPLACE INTO person last_name, first_name
•
- VALUESX3,Y3;
•
Query OK, 1 row affected 0.00 sec
•
mysql REPLACE INTO person last_name, first_name
•
- VALUESX3,Y3; Query OK, 2 rows affected 0.00 sec
The rows- affect ed value in t he second case is 2 because t he original record is delet ed and t he new record is insert ed in it s place.
INSERT IGNORE
and
REPLACE
should be chosen according t o t he duplicat e- handling behavior you want t o effect .
INSERT IGNORE
keeps t he first of a set of duplicat ed records and discards t he rest .
REPLACE
keeps t he last of a set of duplicat es and kicks out any earlier ones.
INSERT IGNORE
is m ore efficient t han
REPLACE
because it doesnt act ually insert duplicat es. Thus, it s m ost applicable when you j ust want t o m ake sure a copy
of a given record is present in a t able.
REPLACE
, on t he ot her hand, is oft en m ore appropriat e for t ables in which ot her non- key colum ns m ay need updat ing. Suppose youre
m aint aining a t able nam ed
passtbl
for a w eb applicat ion t hat cont ains em ail addresses and passwords and t hat is keyed by em ail address:
CREATE TABLE passtbl email CHAR60 NOT NULL,
password CHAR20 BINARY NOT NULL, PRIMARY KEY email
;
How do you creat e records for new users, and change passwords for exist ing users? Wit hout
REPLACE
, creat ing a new user and changing an exist ing users password m ust be handled different ly. A t ypical algorit hm for handling record m aint enance m ight look like t his:
•
I ssue a
SELECT
t o see if a record already exist s w it h a given
email
value.
•
I f no such record exist s, add a new one wit h
INSERT
.
•
I f t he record does exist , updat e it wit h
UPDATE
.
All of t hat m ust be perform ed wit hin a t ransact ion or wit h t he t ables locked t o prevent ot her users from changing t he t ables while youre using t hem . Wit h
REPLACE
, you can sim plify bot h cases t o t he sam e single-st at em ent operat ion:
REPLACE INTO passtbl email,password VALUES
address
,
passval
; I f no record wit h t he given em ail address exist s, MySQL creat es a new one. I f a record does
exist , MySQL replaces it ; in effect , t his updat es t he
password
colum n of t he record associat ed wit h t he address.
INSERT IGNORE
and
REPLACE
have t he benefit of elim inat ing overhead t hat m ight ot herwise be required for a t ransact ion. But t his benefit com es at t he price of port abilit y,
because bot h are MySQL- specific st at em ent s. I f port abilit y is a high priorit y, you m ight prefer t o st ick wit h a t ransact ional approach.
For bulk- load operat ions in which you use t he
LOAD DATA
st at em ent t o load a set of records from a file int o a t able, duplicat e- record handling can be cont rolled using t he
st at em ent s
IGNORE
and
REPLACE
m odifiers. These produce behavior analogous t o t hat of t he
INSERT IGNORE
and
REPLACE
st at em ent s. See Recipe 10.8
for m ore inform at ion.
14.4 Counting and Identifying Duplicates