Problem Solution Discussion Validation Using Table Metadata

cont ain slashes, so it s easier t o use a different charact er around t he pat t ern t o avoid having t o escape t he slashes wit h backslashes: mhttp:|ftp:|mailto:i The alt ernat ives in t he pat t ern are grouped wit hin parent heses because ot herwise t he will anchor only t he first of t hem t o t he beginning of t he st ring. The i m odifier follows t he pat t ern because prot ocol specifiers in URLs are not case sensit ive. The pat t ern is ot herwise fairly unrest rict ive, because it allows anyt hing t o follow t he prot ocol specifier. I leave it t o you t o add furt her rest rict ions as necessary.

10.28 Validation Using Table Metadata

10.28.1 Problem

You need t o check input values against t he legal m em bers of an ENUM or SET colum n.

10.28.2 Solution

Get t he colum n definit ion, ext ract t he list of m em bers from it , and check dat a values against t he list .

10.28.3 Discussion

Som e form s of validat ion involve checking input values against inform at ion st ored in a dat abase. This includes values t o be st ored in an ENUM or SET colum n, w hich can be checked against t he valid m em bers st ored in t he colum n definit ion. Dat abase-backed validat ion also applies when you have values t hat m ust m at ch t hose list ed in a lookup t able t o be considered legal. For exam ple, input records t hat cont ain cust om er I Ds can be required t o m at ch a record in a customers t able, or st at e abbreviat ions in addresses can be verified against a t able t hat list s each st at e. This sect ion describes ENUM - and SET - based validat ion, and Recipe 10.29 discusses how t o use lookup t ables. One way t o check input values t hat correspond t o t he legal values of ENUM or SET colum ns is t o get t he list of legal colum n values int o an array using t he inform at ion ret urned by SHOW COLUMNS , t hen perform an array m em bership t est . For exam ple, t he favorit e- color colum n color fr om t he profile t able is an ENUM t hat is defined as follow s: mysql SHOW COLUMNS FROM profile LIKE color\G 1. row Field: color Type: enumblue,red,green,brown,black,white Null: YES Key: Default: NULL Extra: I f you ext ract t he list of enum erat ion m em bers from t he Type value and st ore t hem in an array members , you can perform t he m em bership t est like t his: valid = grep vali, members; The pat t ern const ruct or begins and ends wit h and t o require val t o m at ch an ent ire enum erat ion m em ber rat her t han j ust a subst ring . I t also is followed by an i t o specify a case- insensit ive com parison, because ENUM colum ns are not case sensit ive. I n Recipe 9.7 , we wrot e a funct ion get_enumorset_info t hat ret urns ENUM or SET colum n m et adat a. This includes t he list of m em bers, so it s easy t o use t hat funct ion t o writ e anot her ut ilit y rout ine, check_enum_value , t hat get s t he legal enum erat ion values and perform s t he m em bership t est . The rout ine t akes four argum ent s: a dat abase handle, t he t able nam e and colum n nam e for t he ENUM colum n, and t he value t o check. I t ret urns t rue or false t o indicat e whet her or not t he value is legal: sub check_enum_value { my dbh, tbl_name, col_name, val = _; my valid = 0; my info = get_enumorset_info dbh, tbl_name, col_name; if info info-{type} eq enum { use case-insensitive comparison; ENUM columns are not case sensitive valid = grep vali, {info-{values}}; } return valid; } For single- value t est ing, such as t o validat e a value subm it t ed in a web form , t hat kind of t est works well. However, if youre going t o be t est ing a lot of values like an ent ire colum n in a dat afile , it s bet t er t o read t he enum erat ion values int o m em ory once, t hen use t hem repeat edly t o check each of t he dat a values. Furt herm ore, it s a lot m ore efficient t o perform hash lookups t han array lookups in Perl at least . To do so, ret rieve t he legal enum erat ion values and st ore t hem as keys of a hash. Then t est each input value by checking whet her or not it exist s as a hash key. I t s a lit t le m ore work t o const ruct t he hash, which is why check_enum_value doesnt do so. But for bulk validat ion, t he im proved lookup speed m ore t han m akes up for t he hash const ruct ion overhead. [ 4] [ 4] I f you want t o check for yourself t he relat ive efficiency of array m em bership t est s versus hash lookups, t ry t he lookup_t im e.pl script in t he t ransfer direct ory of t he recipes dist r ibut ion. Begin by get t ing t he m et adat a for t he colum n, t hen convert t he list of legal enum erat ion m em bers t o a hash: my ref = get_enumorset_info dbh, tbl_name, col_name; my members; foreach my member {ref-{values}} { convert hash key to consistent case; ENUM isnt case sensitive members{lc member} = 1; } The loop m akes each enum erat ion m em ber exist as t he key of a hash elem ent . The hash key is what s im port ant here; t he value associat ed wit h it is irrelevant . The exam ple shown set s t he value t o 1 , but you could use undef , , or any ot her value. Not e t hat t he code convert s t he hash keys t o lowercase before st oring t hem . This is done because hash key lookups in Perl are case sensit ive. That s fine if t he values t hat youre checking also are case sensit ive, but ENUM colum ns are not . By convert ing t he enum erat ion values t o a given let t ercase before st oring t hem in t he hash, t hen convert ing t he values you want t o check sim ilarly, you perform in effect a case insensit ive key exist ence t est : valid = exists members{lc val}; The preceding exam ple convert s enum erat ion values and input values t o lowercase. You could j ust as well use uppercase—as long as you do so for all values consist ent ly. Not e t hat t he exist ence t est m ay fail if t he input value is t he em pt y st ring. Youll have t o decide how t o handle t hat case on a colum n-by- colum n basis. For exam ple, if t he colum n allow s NULL values, you m ight int erpret t he em pt y st ring as equivalent t o NULL and t hus as being a legal value. The validat ion procedure for SET values is sim ilar t o t hat for ENUM values, except t hat an input value m ight consist of any num ber of SET m em bers, separat ed by com m as. For t he value t o be legal, each elem ent in it m ust be legal. I n addit ion, because any num ber of m em bers includes none, t he em pt y st ring is a legal value for any SET colum n. For one- shot t est ing of individual input values, you can use a ut ilit y rout ine check_set_value t hat is sim ilar t o check_enum_value : sub check_set_value { my dbh, tbl_name, col_name, val = _; my valid = 0; my info = get_enumorset_info dbh, tbl_name, col_name; if info info-{type} eq set { return 1 if val eq ; empty string is legal element use case-insensitive comparison; SET columns are not case sensitive valid = 1; assume valid until we find out otherwise foreach my v split ,, val { if grep vi, {info-{values}} { valid = 0; value contains an invalid element last; } } } return valid; } For bulk t est ing, const ruct a hash from t he legal SET m em bers. The procedure is t he sam e as for producing a hash from ENUM elem ent s: my ref = get_enumorset_info dbh, tbl_name, col_name; my members; foreach my member {ref-{values}} { convert hash key to consistent case; SET isnt case sensitive members{lc member} = 1; } To validat e a given input value against t he SET m em ber hash, convert it t o t he sam e let t ercase as t he hash keys, split it at com m as t o get a list of t he individual elem ent s of t he value, t hen check each one. I f any of t he elem ent s are invalid, t he ent ire value is invalid: valid = 1; assume valid until we find out otherwise foreach my elt split ,, lc val { if exists members{elt} { valid = 0; value contains an invalid element last; } } Aft er t he loop t erm inat es, valid is t rue if t he value is legal for t he SET colum n, and false ot herwise. Em pt y st rings are always legal SET values, but t his code doesnt perform any special-case t est for an em pt y st ring. No such t est is necessary, because in t hat case t he split operat ion ret urns an em pt y list , t he loop never execut es, and valid rem ains t rue.

10.29 Validation Using a Lookup Table