Problem Solution Discussion Using Patterns to Match Dates or Times

val =~ s\Dg; valid = length val == 16; I nnings pit ched I n baseball, one st at ist ic r ecor ded for pit cher s is t he num ber of innings pit ched, m easured in t hirds of innings corresponding t o t he num ber of out s recorded. These values are num eric, but m ust sat isfy a specific addit ional const raint : A fract ional part is allow ed, but if present , m ust consist of a single digit 0, 1, or 2. That is, legal values are of t he for m , .1 , .2 , 1 , 1.1 , 1.2 , 2 , and so for t h. To m at ch an unsigned int eger opt ionally follow ed by a decim al point and perhaps a fract ional digit of , 1 , or 2 , or a fract ional digit w it h no leading int eger, use t his pat t er n: \d+\.[012]??|\.[012] The alt er nat ives in t he pat t er n ar e gr ouped w it hin par ent heses because ot her w ise t he anchors only t he first of t hem t o t he beginning of t he st r ing and t he anchors only t he second t o t he end.

10.26 Using Patterns to Match Dates or Times

10.26.1 Problem

You need t o m ake sure a st ring looks like a dat e or t im e.

10.26.2 Solution

Use a pat t ern t hat m at ches t he t ype of t em poral value you expect . Be sure t o consider issues such as how st rict t o be about delim it ers bet ween subpart s and t he lengt hs of t he subpart s.

10.26.3 Discussion

Dat es are a validat ion headache because t hey com e in so m any form at s. Pat t ern t est s are ext rem ely useful for weeding out illegal values, but oft en insufficient for full verificat ion: a dat e m ight have a num ber where you expect a m ont h, but if t he num ber is 13, t he dat e isnt valid. This sect ion int roduces som e pat t erns t hat m at ch a few com m on dat e form at s. Recipe 10.31 revisit s t his t opic in m ore det ail and discusses how t o com bine pat t ern t est s wit h cont ent verificat ion. To require values t o be dat es in I SO CCYY-MM-DD form at , use t his pat t ern: \d{4}-\d{2}-\d{2} The pat t ern requires - as t he delim it er bet ween dat e part s. To allow eit her - or as t he delim it er, use a charact er class bet ween t he num eric part s t he slashes are escaped wit h a backslash t o prevent t hem from being int erpret ed as t he end of t he pat t ern const ruct or : \d{4}[-\]\d{2}[-\]\d{2} Or you can use a different delim it er around t he pat t ern and avoid t he backslashes: m|\d{4}[-]\d{2}[-]\d{2}| To allow any non-digit delim it er which corresponds t o how MySQL operat es when it int erpret s st rings as dat es , use t his pat t ern: \d{4}\D\d{2}\D\d{2} I f you dont require t he full num ber of digit s in each part t o allow leading zeros in values like 03 t o be m issing, for exam ple , j ust look for t hree nonem pt y digit sequences: \d+\D\d+\D\d+ Of course, t hat pat t ern is so general t hat it will also m at ch ot her values such as U.S. Social Securit y num bers which have t he form at 012-34- 5678 . To const rain t he subpart lengt hs by requiring t wo t o four digit s in t he year part and one or t wo digit s in t he m ont h and day part s, use t his pat t ern: \d{2,4}?\D\d{1,2}\D\d{1,2} For dat es in ot her form at s such as MM-DD-YY or DD-MM-YY , sim ilar pat t erns apply, but t he subpart s are arranged in a different order. This pat t ern m at ches bot h of t hose form at s: \d{2}-\d{2}-\d{2} I f you need t o check t he values of individual dat e part s, use parent heses in t he pat t ern and ext ract t he subst rings aft er a successful m at ch. I f youre expect ing dat es t o be in I SO form at , for exam ple, do som et hing like t his: if val =~ \d{2,4}\D\d{1,2}\D\d{1,2} { year, month, day = 1, 2, 3; } The library file lib Cookbook_Ut ils.pm in t he recipes dist ribut ion cont ains several of t hese pat t ern t est s, packaged as funct ion calls. I f t he dat e doesnt m at ch t he pat t ern, t hey ret urn undef . Ot herwise, t hey ret urn a reference t o an array cont aining t he broken-out values for t he year, m ont h, and day. This can be useful for perform ing furt her checking on t he com ponent s of t he dat e. For exam ple, is_iso_date looks for dat es t hat m at ch I SO form at . I t s defined as follows: sub is_iso_date { my s = shift; return undef unless s =~ \d{2,4}\D\d{1,2}\D\d{1,2}; return [ 1, 2, 3 ]; return year, month, day } To use t he funct ion, do som et hing like t his: my ref = is_iso_date val; if defined ref { val matched ISO format pattern; check its subparts using ref-[0] through ref-[2] } else { val didnt match ISO format pattern } Youll oft en find addit ional processing necessary wit h dat es, because alt hough dat e-m at ching pat t erns help t o weed out values t hat are synt act ically m alform ed, t hey dont assess whet her t he individual com ponent s cont ain legal values. To do t hat , som e range checking is necessary. That t opic is covered lat er in Recipe 10.31 . I f youre willing t o skip subpart t est ing and j ust want t o rewrit e t he pieces, you can use a subst it ut ion. For exam ple, t o rewrit e values assum ed t o be in MM-DD-YY form at int o YY-MM- DD form at , do t his: val =~ s\d+\D\d+\D\d+3-1-2; Tim e values are som ewhat m ore orderly t han dat es, usually being writ t en wit h hours first and seconds last , wit h t wo digit s per part : \d{2}:\d{2}:\d{2} To be m ore lenient , you can allow t he hours part t o have a single digit , or t he seconds part t o be m issing: \d{1,2}:\d{2}:\d{2}? You can m ark part s of t he t im e wit h parent heses if you want t o range- check t he individual part s, or perhaps t o reform at t he value t o include a seconds part of 00 if it happens t o be m issing. However, t his requires som e care wit h t he parent heses and t he ? charact ers in t he pat t ern if t he seconds part is opt ional. You want t o allow t he ent ire :\d{2} at t he end of t he pat t ern t o be opt ional, but not t o save t he : charact er in 3 if t he t hird t im e sect ion is present . To accom plish t hat , use ?: pat , an alt ernat ive grouping not at ion t hat doesnt save t he m at ched subst ring. Wit hin t hat not at ion, use parent heses around t he digit s t o save t hem . Then 3 w ill be undef if t he seconds part is not present , but w ill cont ain t he seconds digit s ot herwise: if val =~ \d{1,2}:\d{2}?::\d{2}? { my hour, min, sec = 1, 2, 3; sec = 00 if defined sec; seconds missing; use 00 val = hour:min:sec; } To rewrit e t im es in 12- hour form at wit h AM and PM suffixes int o 24- hour form at , you can do som et hing like t his: if val =~ \d{1,2}:\d{2}?::\d{2}?\sAM|PM?i { my hour, min, sec = 1, 2, 3; supply missing seconds sec = 00 unless defined sec; convert 0 .. 11 - 12 .. 23 for PM times hour += 12 if defined 4 uc 4 eq PM; val = hour:min:sec; } The t im e part s are placed int o 1 , 2 , and 3 , w it h 3 set t o undef if t he seconds part is m issing. The suffix goes int o 4 if it s present . I f t he suffix is AM or m issing undef , t he value is int erpret ed as an AM t im e. I f t he suffix is PM , t he value is int erpret ed as a PM t im e.

10.26.4 See Also