Problem Solution Discussion Using Web Input to Construct Queries

For ot her t ypes of validat ion, MySQL is int im at ely involved. I f a field value is t o be st ored int o an ENUM colum n, you can m ake sure t he value is one of t he legal enum erat ion values by checking t he colum n definit ion wit h SHOW COLUMNS . Having described som e of t he kinds of web input validat ion you m ight want t o carry out , I wont furt her discuss t hem here. These and ot her form s of validat ion t est ing are described in Chapt er 10 . That chapt er is orient ed largely t oward bulk input validat ion, but t he t echniques discussed t here apply t o web program m ing as well, because processing form input or URL param et ers is, in essence, perform ing a dat a im port operat ion.

18.8 Using Web Input to Construct Queries

18.8.1 Problem

I nput obt ained over t he Web cannot be t rust ed and should not be placed int o a query wit hout t aking t he proper precaut ions.

18.8.2 Solution

Sanit ize dat a values by using placeholders or a quot ing funct ion.

18.8.3 Discussion

Aft er youve ext ract ed input param et er values and checked t hem t o m ake sure t heyre valid, youre ready t o use t hem t o const ruct a query. This is act ually t he easy part , t hough it s necessary t o t ake t he proper precaut ions t o avoid m aking a m ist ake t hat youll regret . First , let s consider what can go wrong, t hen see how t o prevent t he problem . Suppose you have a search form cont aining a keyword field t hat act s as a front end t o a sim ple search engine. When a user subm it s a keyword, you int end t o use it t o find m at ching records in a t able by const ruct ing a query like t his: SELECT FROM mytbl WHERE keyword = keyword_val Here, keyword_val represent s t he value ent ered by t he user. I f t he value is som et hing like eggplant , t he r esult ing query is: SELECT FROM mytbl WHERE keyword = eggplant The query ret urns all eggplant - m at ching records, presum ably generat ing a sm all result set . But suppose t he user is t ricky and t ries t o subvert your script by ent ering t he following value: eggplant OR x=x I n t his case, t he query becom es: SELECT FROM mytbl WHERE keyword = eggplant OR x=x That query m at ches every record in t he t able I f t he t able is quit e large, t he input effect ively becom es a form of denial- of- service at t ack, because it causes your syst em t o devot e resources away from legit im at e request s int o doing useless work. Likely result s are: • Ext ra load on t he MySQL server • Out - of-m em ory problem s in your script as it t ries t o digest t he result set received from MySQL • Ext ra net work bandwidt h consum pt ion as t he script sends t he result s t o t he client I f your script generat es a DELETE st at em ent , t he consequences of t his kind of subversion can be m uch w orse—your script m ight issue a query t hat em pt ies a t able com plet ely, when you int ended t o allow it t o delet e only a single record at a t im e. The im plicat ion is t hat providing a web int erface t o your dat abase opens you up t o cert ain form s of at t ack. However, you can prevent t his kind of problem by m eans of a sim ple precaut ion t hat you should already be following: dont put dat a values lit erally int o query st rings. Use placeholders or an encoding funct ion inst ead. For exam ple, in Perl you can handle an input param et er like t his using placeholders: keyword = param keyword; sth = dbh-prepare SELECT FROM mytbl WHERE keyword = ?; sth-execute keyword; ... fetch result set ... Or like t his using quote : keyword = param keyword; keyword = dbh-quote keyword; sth = dbh-prepare SELECT FROM mytbl WHERE keyword = keyword; sth-execute ; ... fetch result set ... Eit her way, if t he user ent ers t he subversive value, t he query becom es: SELECT FROM mytbl WHERE keyword = eggplant\ OR \x\=\x The input is rendered harm less, and t he result is t hat t he query will m at ch no records rat her t han all records—definit ely a m ore suit able response t o som eone whos t rying t o break your script . Placeholder and quot ing t echniques for PHP, Pyt hon, and Java are sim ilar, and have been discussed in Recipe 2.7 and Recipe 2.8 . For JSP pages writ t en using t he JSTL t ag library, you can quot e input param et er values using placeholders and t he sql:param t ag Recipe 16.4 . For exam ple, t o use t he value of a form param et er nam ed keyword in a SELECT st at em ent , do t his: sql:query var=rs dataSource={conn} SELECT FROM mytbl WHERE keyword = ? sql:param value={param[keyword]} sql:query Placeholders and encoding funct ions apply only t o SQL dat a values. One issue not addressed by t hem is how t o handle web input used for ot her kinds of query elem ent s such as t he nam es of dat abases, t ables, and colum ns. I f you int end t o insert such values int o a query, you m ust insert t hem lit erally, which m eans you should check t hem first . For exam ple, if you const ruct a query such as t he following, you should verify t hat tbl_name cont ains a reasonable value: SELECT FROM tbl_name; But what does reasonable m ean? I f you dont have t ables cont aining st range charact ers in t heir nam es, it m ay be sufficient t o m ake sure t hat tbl_name cont ains only alphanum eric charact ers or underscores. An alt ernat ive is t o issue a SHOW TABLES quer y t o m ake sure t hat t he t able nam e in quest ion is in t he dat abase. This is m ore foolproof, at t he cost of an addit ional query. Anot her issue not covered by placeholder t echniques involves a quest ion of int erpret at ion: if a form field is opt ional, what should you st ore in t he dat abase if t he user leaves t he field em pt y? Perhaps t he value represent s an em pt y st ring—or perhaps it should be int erpret ed as NULL . One way t o resolve t his quest ion is t o consult t he colum n m et adat a. I f t he colum n can cont ain NULL values, t hen int erpret an em pt y field as NULL . Ot herw ise, t ake an em pt y field t o m ean an em pt y st ring. Try to Break Your Scripts The discussion in t his sect ion has been phrased in t erm s of guarding against ot her users from at t acking your script s. But it s not a bad idea t o put yourself in t he place of an at t acker and adopt t he m indset , How can I break t his applicat ion? That is, consider whet her t here is som e input you can subm it t o it t hat t he applicat ion wont handle, and t hat will cause it t o generat e a m alform ed query? I f you can cause it t o m isbehave, so can ot her people, eit her deliberat ely or accident ally. Be wary of bad input , and writ e your applicat ions accordingly. I t s bet t er t o be prepared t han t o j ust hope.

18.8.4 See Also