Problem Solution Discussion Using Patterns to Match Numeric Values

You want t o classify values int o broad cat egories.

10.24.2 Solution

Use a pat t ern t hat is sim ilarly broad.

10.24.3 Discussion

I f you need t o know only whet her values are em pt y, nonem pt y, or consist only of cert ain t ypes of charact ers, pat t erns such as t he following m ay suffice: Pa t t e r n Type of va lu e t h e pa t t e r n m a t ch e s Em pt y value . Nonem pt y value \s Whit espace, possibly em pt y \s+ Nonem pt y whit espace \S Nonem pt y, and not j ust whit espace \d+ Digit s only, nonem pt y [a-z]+i Alphabet ic charact ers only case insensit ive , nonem pt y \w+ Alphanum eric or underscore charact ers only, nonem pt y

10.25 Using Patterns to Match Numeric Values

10.25.1 Problem

You need t o m ake sure a st ring looks like a num ber.

10.25.2 Solution

Use a pat t ern t hat m at ches t he t ype of num ber youre looking for.

10.25.3 Discussion

Pat t erns can be used t o classify values int o several t ypes of num bers: Pa t t e r n Type of va lu e t h e pa t t e r n m a t ch e s \d+ Unsigned int eger -?\d+ Negat ive or unsigned int eger [-+]?\d+ Signed or unsigned int eger [-+]?\d+\.\d?|\.\d+ Float ing- point num ber The pat t ern \d+ m at ches unsigned int egers by requiring a nonem pt y value t hat consist s only of digit s from t he beginning t o t he end of t he value. I f you care only t hat a value begins wit h an int eger, you can m at ch an init ial num eric part and ext ract it . To do t his, m at ch j ust t he init ial part of t he st ring om it t he t hat requires t he pat t ern t o m at ch t o t he end of t he st ring and place parent heses around t he \d+ part . Then refer t o t he m at ched num ber as 1 aft er a successful m at ch: if val =~ \d+ { val = 1; reset value to matched subpart } You could also add zero t o t he value, which causes Perl t o perform an im plicit st ring- t o- num ber conversion t hat discards t he non- num eric suffix: if val =~ \d+ { val += 0; } However, if you run Perl wit h t he - w opt ion which I recom m end , t his form of conversion generat es warnings for values t hat act ually have a non- num eric part . I t will also convert st ring values like 0013 t o t he num ber 13 , which m ay be unaccept able in som e cont ext s. Som e kinds of num eric values have a special form at or ot her unusual const raint s. Here are a few exam ples, and how t o deal wit h t hem : Zip Codes Zip and Zip+ 4 Codes are post al codes used for m ail delivery in t he Unit ed St at es. They have values like 12345 or 12345-6789 t hat is, five digit s, possibly follow ed by a dash and four m ore digit s . To m at ch one form or t he ot her, or bot h form s, use t he follow ing pat t erns: Pa t t e r n Type of va lu e t h e pa t t e r n m a t ch e s \d{5} Zip Code, five digit s only \d{5}-\d{4} Zip+ 4 Code \d{5}-\d{4}? Zip or Zip+ 4 Code Credit card num bers Cr edit car d num ber s t ypically consist of digit s, but it s com m on for values t o be w rit t en w it h spaces, dashes, or ot her char act er s bet w een gr oups of digit s. For exam ple, t he follow ing num ber s w ould be consider ed equivalent : 0123456789012345 0123 4567 8901 2345 0123-4567-8901-2345 To m at ch such values, use t his pat t ern: [- \d]+ Not e t hat Perl allow s t he \d digit specifier w it hin char act er classes. How ever , t hat pat t er n doesnt ident ify values of t he w r ong lengt h, and it m ay be useful t o r em ove ext r aneous char act er s. I f you r equire credit card values t o cont ain 16 digit s, use a subst it ut ion t o rem ove all non- digit s, t hen check t he lengt h of t he r esult : val =~ s\Dg; valid = length val == 16; I nnings pit ched I n baseball, one st at ist ic r ecor ded for pit cher s is t he num ber of innings pit ched, m easured in t hirds of innings corresponding t o t he num ber of out s recorded. These values are num eric, but m ust sat isfy a specific addit ional const raint : A fract ional part is allow ed, but if present , m ust consist of a single digit 0, 1, or 2. That is, legal values are of t he for m , .1 , .2 , 1 , 1.1 , 1.2 , 2 , and so for t h. To m at ch an unsigned int eger opt ionally follow ed by a decim al point and perhaps a fract ional digit of , 1 , or 2 , or a fract ional digit w it h no leading int eger, use t his pat t er n: \d+\.[012]??|\.[012] The alt er nat ives in t he pat t er n ar e gr ouped w it hin par ent heses because ot her w ise t he anchors only t he first of t hem t o t he beginning of t he st r ing and t he anchors only t he second t o t he end.

10.26 Using Patterns to Match Dates or Times