You m ust of course inst all t he m odule file in a direct ory where Perl will find it . For det ails on library inst allat ion, see
Recipe 2.4 .
A significant benefit of put t ing a collect ion of ut ilit y rout ines int o a library file is t hat you can use it for all kinds of program s. I t s rare for a dat a m anipulat ion problem t o be com plet ely
unique. I f you can pick and choose at least a few validat ion rout ines from a library, it s possible t o reduce t he am ount of code you need t o writ e, even for highly specialized
program s.
10.22 Validation by Direct Comparison
10.22.1 Problem
You need t o m ake sure a value is equal t o or not equal t o som e specific value, or t hat it lies wit hin a given range of values.
10.22.2 Solution
Perform a direct com parison.
10.22.3 Discussion
The sim plest kind of validat ion is t o perform com parisons against specific lit eral values: require a nonempty value
valid = val ne ; require a specific nonempty value
valid = val eq abc; require one of several values
valid = val eq abc || val eq def || val eq xyz; require value in particular range 1 to 10
valid = val = 1 val = 10;
Most of t hose t est s perform st ring com parisons. The last is a num eric com parison; however, a num eric com parison oft en is preceded by prelim inary t est s t o verify first t hat t he value doesnt
cont ain non-num eric charact ers. Pat t ern t est ing, discussed in t he next sect ion, is one such way t o do t hat .
St ring com parisons are case sensit ive by default . To m ake a com parison case insensit ive, convert bot h operands t o t he sam e let t ercase:
require a specific nonempty value in case-insensitive fashion valid = lc val eq lc AbC;
10.23 Validation by Pattern Matching
10.23.1 Problem
You need t o com pare a value t o a set of values t hat is difficult t o specify lit erally wit hout writ ing a really ugly expression.
10.23.2 Solution
Use pat t ern m at ching.
10.23.3 Discussion
Pat t ern m at ching is a powerful t ool for validat ion because it allows you t o t est ent ire classes of values wit h a single expression. You can also use pat t ern t est s t o break up m at ched values
int o subpart s for furt her individual t est ing, or in subst it ut ion operat ions t o rewrit e m at ched values. For exam ple, you m ight break up a m at ched dat e int o pieces so t hat you can verify
t hat t he m ont h is in t he range from 1 t o 12 and t he day is wit hin t he num ber of days in t he m ont h. Or you m ight use a subst it ut ion t o reorder
MM-DD-YY
or
DD-MM-YY
values int o
YY- MM-DD
for m at . The next few sect ions describe how t o use pat t erns t o t est for several t ypes of values, but first
let s t ake a quick t our of som e general pat t ern- m at ching principles. The following discussion focuses on Perls regular expression capabilit ies. Pat t ern m at ching in PHP and Pyt hon is
sim ilar, t hough you should consult t he relevant docum ent at ion for any differences. For Java, t he ORO pat t ern m at ching class library offers Perl- st yle pat t ern m at ching;
Appendix A indicat es where you can get it .
I n Perl, t he pat t ern const ruct or is
pat
: it_matched = val =~
pat
; pattern match Put an
i
aft er t he
pat
const ruct or t o m ake t he pat t ern m at ch case insensit ive: it_matched = val =~
pat
i; case-insensitive match To use a charact er ot her t han slash, begin t he const ruct or wit h
m
. This can be useful if t he pat t ern it self cont ains slashes:
it_matched = val =~ m|
pat
|; alternate constructor character To look for a non- m at ch, replace t he
=~
operat or wit h t he
~
operat or: no_match = val ~
pat
; negated pattern match To perform a subst it ut ion in
val
based on a pat t ern m at ch, use
s pat
replacement
. I f
pat
occurs wit hin
val
, it s replaced by
replacement
. To perform a case- insensit ive m at ch, put an
i
aft er t he last slash. To perform a global subst it ut ion t hat replaces all inst ances of
pat
rat her t han j ust t he first one, add a
g
aft er t he last slash:
val =~ s
pat replacement
; substitution val =~ s
pat replacement
i; case-insensitive substitution val =~ s
pat replacement
g; global substitution val =~ s
pat replacement
ig; case-insensitive and global Heres a list of som e of t he special pat t ern elem ent s available in Perl regular expressions:
Pa t t e r n W h a t t he pa t t e r n m a t che s
Beginning of st ring End of st ring
.
Any charact er
\s
,
\S
Whit espace or non- whit espace charact er
\d
,
\D
Digit or non-digit charact er
\w
,
\W
Word alphanum eric or underscore or non-word charact er
[...]
Any charact er list ed bet ween t he square bracket s
[...]
Any charact er not list ed bet ween t he square bracket s
p1 |
p2 |
p3
Alt ernat ion; m at ches any of t he pat t erns
p1
,
p2
, or
p3
Zero or m ore inst ances of preceding elem ent
+
One or m ore inst ances of preceding elem ent
{ n
} n
inst ances of preceding elem ent
{ m
, n
} m
t hrough
n
inst ances of preceding elem ent Many of t hese pat t ern elem ent s are t he sam e as t hose available for MySQLs
REGEXP
regular expression operat or. See
Recipe 4.8 .
To m at ch a lit eral inst ance of a charact er t hat is special wit hin pat t erns, such as ,
, or ,
precede it wit h a backslash. Sim ilarly, t o include a charact er wit hin a charact er class const ruct ion t hat is special in charact er classes
[
,
]
, or
-
, precede it wit h a backslash. To include a lit eral
in a charact er class, list it som ewhere ot her t han as t he first charact er bet ween t he bracket s.
Many of t he validat ion pat t erns shown in t he following sect ions are of t he
pat
form . Beginning and ending a pat t ern wit h
and has t he effect of requiring
pat
t o m at ch t he ent ire st ring t hat youre t est ing. This is com m on in dat a validat ion cont ext s, because it s
generally desirable t o know t hat a pat t ern m at ches an ent ire input value, not j ust part of it . I f you want t o be sure t hat a value represent s an int eger, for exam ple, it doesnt do you any
good t o know only t hat it cont ains an int eger som ewhere. This is not a hard- and- fast rule, however, and som et im es it s useful t o perform a m ore relaxed t est by om it t ing t he
and charact ers as appropriat e. For exam ple, if you want t o st rip leading and t railing whit espace
from a value, use one pat t ern anchored only t o t he beginning of t he st ring, and anot her anchored only t o t he end:
val =~ s\s+; trim leading whitespace val =~ s\s+; trim trailing whitespace
That s such a com m on operat ion, in fact , t hat it s a good candidat e for being placed int o a ut ilit y funct ion. The Cookbook_Ut ils.pm file cont ains a funct ion
trim_whitespace
t hat perform s bot h subst it ut ions and ret urns t he result :
val = trim_whitespace val; To rem em ber subsect ions of a st ring t hat is m at ched by a pat t ern, use parent heses around
t he relevant part s of t he pat t ern. Aft er a successful m at ch, you can refer t o t he m at ched subst rings using t he variables
1
,
2
, and so fort h: if abcdef =~ ab.
{ first_part = 1; this will be ab
the_rest = 2; this will be cdef }
To indicat e t hat an elem ent wit hin a pat t ern is opt ional, follow it by a
?
charact er. To m at ch values consist ing of a sequence of digit s, opt ionally beginning wit h a m inus sign, and
opt ionally ending wit h a period, use t his pat t ern: -?\d+\.?
You can also use parent heses t o group alt ernat ions wit hin a pat t ern. The following pat t ern m at ches t im e values in
hh:mm
form at , opt ionally followed by
AM
or
PM
: \d{1,2}:\d{2}\sAM|PM?i
The use of parent heses in t hat pat t ern also has t he side-effect of rem em bering t he opt ional par t in
1
. To suppress t hat side-effect , use
?: pat
inst ead: \d{1,2}:\d{2}\s?:AM|PM?i
That s sufficient background in Perl pat t ern m at ching t o allow const ruct ion of useful validat ion t est s for several t ypes of dat a values. The following sect ions provide pat t erns t hat can be used
t o t est for broad cont ent t ypes, num bers, t em poral values, and em ail addresses or URLs. The t ransfer direct ory of t he
recipes
dist ribut ion cont ains a t est _pat .pl script t hat reads input values, m at ches t hem against several pat t erns, and report s which pat t erns each value
m at ches. The script is easily ext ensible, so you can use it as a t est harness t o t ry out your own pat t erns.
10.24 Using Patterns to Match Broad Content Types