Use a Hash as a Cache of Already-Seen Lookup Values

This reduces t he dat abase t raffic t o a single query. However, for a large lookup t able, t hat m ay st ill be a lot of t raffic, and you m ay not want t o hold t he ent ire t able in m em ory. Performing Lookups with Other Languages The exam ple shown here for bulk t est ing of lookup values uses a Perl hash t o det erm ine whet her or not a given value is present in a set of values: valid = exists members{val}; Sim ilar dat a st ruct ures exist for ot her languages. I n PHP, you can use an associat ive array and perform a key lookup like t his: valid = isset members[val]; I n Pyt hon, use a dict ionary and check input values using t he has_key m et hod: valid = members.has_key val For lookups in Java, use a HashMap and t est values wit h t he containsKey m et hod: valid = members.containsKey val; The t ransfer direct ory of t he recipes dist ribut ion cont ains som e sam ple code for lookup operat ions in each of t hese languages.

10.29.6 Use a Hash as a Cache of Already-Seen Lookup Values

Anot her lookup t echnique is t o m ix use of individual queries wit h a hash t hat st ores lookup value exist ence inform at ion. This approach can be useful if you have a very large lookup t able. Begin wit h an em pt y hash: my members; hash for lookup values Then, for each value t o be t est ed, check whet her or not it s present in t he hash. I f not , issue a query t o see if t he value is present in t he lookup t able, and record t he result of t he query in t he hash. The validit y of t he input value is det erm ined by t he value associat ed wit h t he key, not by t he exist ence of t he key: if exists members{val} havent seen this value yet { my count = dbh-selectrow_array SELECT COUNT FROM tbl_name WHERE val = ?, undef, val; store truefalse to indicate whether value was found members{val} = count 0; } valid = members{val}; For t his m et hod, t he hash act s as a cache, so t hat you run a lookup query for any given value only once, no m at t er how m any t im es it occurs in t he input . For dat aset s t hat have a reasonable num ber of repeat ed values, t his approach avoids issuing a separat e query for every single value, while requiring an ent ry in t he hash only for each unique value. I t t hus st ands bet ween t he ot her t wo approaches in t erm s of t he t radeoff bet ween dat abase t raffic and program m em ory requirem ent s for t he hash. Not e t hat t he hash is used in a som ewhat different m anner for t his m et hod t han for t he previous m et hod. Previously, t he exist ence of t he input value as a key in t he hash det erm ined t he validit y of t he value, and t he value associat ed wit h t he hash key was irrelevant . For t he hash-as-cache m et hod, t he m eaning of key exist ence in t he hash changes from it s valid t o it s been t est ed before. For each key, t he value associat ed wit h it indicat es whet her t he input value is present in t he lookup t able. I f you st ore as keys only t hose values t hat are found t o be in t he lookup t able, youll issue a query for each inst ance of an invalid value in t he input dat aset , which is inefficient .

10.30 Converting Two-Digit Year Values to Four-Digit Form