Problem Solution Discussion Setting Up Database Logging

c:if c:if test={empty host} c:set var=host UNKNOWN c:set c:if sql:update dataSource={conn} INSERT INTO hitlog path, host VALUES?,? sql:param= request.getRequestURI sql:param sql:param value={host} sql:update The hitlog t able has t he following useful propert ies: • Access t im es are recorded aut om at ically in t he TIMESTAMP colum n t w hen you insert new records. • By linking t he path colum n t o an AUTO_INCREMENT colum n hits , t he count er values for a given page pat h increm ent aut om at ically whenever you insert a new record for t hat pat h. The count ers are m aint ained separat ely for each dist inct path value. For m ore inform at ion on how m ult iple-colum n sequences work, see Recipe 11.15 . • Theres no need t o check whet her t he count er for a page already exist s, because you insert a new row each t im e you record a hit for a page, not j ust for t he first hit . • I f you want t o det erm ine t he current count ers for each page, select t he record for each dist inct path value t hat has t he largest hits value: SELECT path, MAXhits FROM hitlog GROUP BY path;

18.15 Using MySQL for Apache Logging

18.15.1 Problem

You dont want t o use MySQL t o log accesses for j ust a few pages, as shown in Recipe 18.14 . You want t o log all pages accesses, and you dont want t o have t o put logging act ions in each page explicit ly.

18.15.2 Solution

Tell Apache t o log pages accesses t o MySQL.

18.15.3 Discussion

The uses for MySQL in a web cont ext arent lim it ed j ust t o page generat ion and processing. You can use it t o help you run t he web server it self. For exam ple, m ost Apache servers are set up t o log a record of web request s t o a file. But it s also possible t o send log records t o a program inst ead, from which you can writ e t he records wherever you like—such as t o a dat abase. Wit h log records in a dat abase rat her t han a flat file, t he log becom es m ore highly st ruct ured and you can apply SQL analysis t echniques t o it . Log file analysis t ools m ay be writ t en t o provide som e flexibilit y, but oft en t his is a m at t er of deciding which sum m aries t o display and which t o suppress. I t s m ore difficult t o t ell a t ool t o display inform at ion it wasnt built t o provide. Wit h log ent ries in a t able, you gain addit ional flexibilit y. Want t o see a part icular report ? Writ e t he SQL st at em ent s t hat produce it . To display t he report in a specific form at , issue t he queries from wit hin an API and t ake advant age of your languages out put product ion capabilit ies. By handling log ent ry generat ion and st orage using separat e processes, you gain som e addit ional flexibilit y. Som e of t he possibilit ies are t o send logs from m ult iple web servers t o t he sam e MySQL server, or t o send different logs generat ed by a given web server t o different MySQL servers. This sect ion shows how t o set up web request logging from Apache int o MySQL and dem onst rat es som e sum m ary queries you m ay find useful.

18.15.4 Setting Up Database Logging

Apache logging is cont rolled by direct ives in t he ht t pd.conf configurat ion file. For exam ple, a t ypical logging set up uses LogFormat and CustomLog direct ives t hat look like t his: LogFormat h l u t \r\ s b common CustomLog usrlocalapachelogsaccess_log common The LogFormat line defines a form at for log records and gives it t he nicknam e common . The CustomLog direct ive indicat es t hat lines should be writ t en in t hat form at t o t he access_log file in Apaches logs direct ory. To set up logging t o MySQL inst ead, use t he following procedure: [4] [4] If youre using logging directives such as TransferLog rather than LogFormat and CustomLog , youll need to adapt the instructions in this section. • Decide what values you want t o record and set up a t able t hat cont ains t he appropriat e colum ns. • Writ e a program t o read log lines from Apache and writ e t hem int o t he dat abase. • Set up a LogFormat line t hat defines how t o writ e log lines in t he form at t he program expect s, and a CustomLog direct ive t hat t ells Apache t o writ e t o t he program rat her t han t o a file. Suppose you want t o record t he dat e and t im e of each request , t he host t hat issued t he request , t he request m et hod and URL pat hnam e, t he st at us code, t he num ber of byt es t ransferred, and t he user agent t ypically a browser or spider nam e . A t able t hat includes colum ns for t hese values can be creat ed as follow s: CREATE TABLE httpdlog dt DATETIME NOT NULL, request date host VARCHAR255 NOT NULL, client host method VARCHAR4 NOT NULL, request method GET, PUT, etc. url VARCHAR255 BINARY NOT NULL, URL path status INT NOT NULL, request status size INT, number of bytes transferred agent VARCHAR255 user agent ; Most of t he st ring colum ns use VARCHAR and are not case sensit ive. The except ion, url , is declared as a binary st ring as is appropriat e for a server running on a syst em wit h case- sensit ive filenam es. I f youre using a server where URL let t ercase doesnt m at t er, you can om it t he word BINARY . The httpdlog t able definit ion shown here doesnt include any indexes. You should add som e, because ot herwise any sum m ary queries you run will slow down dram at ically as t he t able becom es large. The choice of which colum ns t o index will be based on t he t ypes of queries you int end t o run t o analyze t he t able cont ent s. For exam ple, queries t o analyze t he dist ribut ion of client host values will benefit from an index on t he host colum n. Next , you need a program t o process log lines produced by Apache and insert t hem int o t he httpdlog t able. The following script , ht t pdlog.pl, opens a connect ion t o t he MySQL server, t hen loops t o read input lines. I t parses each line int o colum n values and insert s t he result int o t he dat abase. When Apache exit s, it closes t he pipe t o t he logging program . That causes ht t pdlog.pl t o see end of file on it s input , t erm inat e t he loop, disconnect from MySQL, and exit . usrbinperl -w httpdlog.pl - Log Apache requests to httpdlog table use strict; use lib qwusrlocalapachelibperl; use Cookbook; my dbh = Cookbook::connect ; my sth = dbh-prepare qq{ INSERT INTO httpdlog dt,host,method,url,status,size,agent VALUES ?,?,?,?,?,?,? }; while loop reading input { chomp; my dt, host, method, url, status, size, agent = split \t, _; map - to NULL for some columns size = undef if size eq -; agent = undef if agent eq -; sth-execute dt, host, method, url, status, size, agent; } dbh-disconnect ; exit 0; I nst all t he ht t pdlog.pl script where you want Apache t o look for it . On m y syst em , t he Apache root direct ory is usr local apache, so usr local apache bin is a reasonable inst allat ion direct ory. The pat h t o t his direct ory will be needed short ly for const ruct ing t he CustomLog direct ive t hat inst ruct s Apache t o log t o t he script . ht t pdlog.pl assum es t hat input lines cont ain httpdlog colum n values delim it ed by t abs t o m ake it easy t o break apart input lines , so Apache m ust writ e log ent ries in a m at ching form at . The LogFormat field specifiers t o produce t he appropriat e values are as follows: {Y-m-d H:M:S} The dat e and t im e of t he request , in MySQLs DATETIME form at h The host from which t he request originat ed m The request m et hod GET , POST , and so fort h U The URL pat h s The st at us code b The num ber of byt es t ransferred { User-A gent} i The user agent To define a logging form at nam ed mysql t hat produces t hese values wit h t abs in bet ween, add t he following LogFormat direct ive t o your ht t pd.conf file: LogFormat {Y-m-d H:M:S}t\th\tm\tU\ts\tb\t{User-Agent}i mysql Most of t he pieces are in place now. We have a log t able, a program t hat writ es t o it , and a mysql form at for producing log ent ries. All t hat rem ains is t o t ell Apache t o writ e t he ent ries t o t he ht t pdlog.pl script . However, unt il you know t hat t he out put form at really is correct and t hat t he program can process log ent ries properly, it s prem at ure t o t ell Apache t o log direct ly t o t he program . To m ake t est ing and debugging a bit easier, have Apache log mysql - form at ent ries t o a file inst ead. That way, you can look at t he file t o check t he out put form at , and you can use it as input t o ht t pdlog.pl t o verify t hat t he program works correct ly. To inst ruct Apache t o log lines in mysql form at t o t he file t est _log in Apaches log direct ory, use t his CustomLog dir ect ive: CustomLog usrlocalapachelogstest_log mysql Then rest art Apache t o enable t he new logging direct ives. Aft er your web server receives a few request s, t ake a look at t he t est _log file. Verify t hat t he cont ent s are as you expect , t hen feed t he file t o ht t pdlog.pl. I f youre in Apaches logs direct ory and t he bin and logs direct ories are bot h under t he Apache root , t he com m and looks like t his: ..binhttpdlog.pl test_log Aft er ht t pdlog.pl finishes, t ake a look at t he httpdlog t able t o m ake sure t hat it looks correct . Once youre sat isfied, t ell Apache t o send log ent ries direct ly t o ht t pdlog.pl by m odifying t he CustomLog direct ive as follow s: CustomLog |usrlocalapachebinhttpdlog.pl mysql The | charact er at t he beginning of t he pat hnam e t ells Apache t hat ht t pdlog.pl is a program , not a file. Rest art Apache and new ent ries should appear in t he httpdlog t able as visit ors request pages from your sit e. Not hing you have done t o t his point changes any logging you m ay have been doing originally. For exam ple, if you were logging t o an access_log file before, you st ill are now. Thus, Apache will be sending ent ries bot h t o t he original log file and t o MySQL. I f t hat s what you want , fine. Apache doesnt care if you log t o m ult iple dest inat ions. But youll use m ore disk space if you do. To disable file logging, com m ent out your original CustomLog direct ive by placing a charact er in front of it , t hen rest art Apache.

18.15.5 Analyzing the Log File