c:if c:if test={empty host}
c:set var=host UNKNOWN
c:set c:if
sql:update dataSource={conn} INSERT INTO hitlog path, host VALUES?,?
sql:param= request.getRequestURI sql:param sql:param value={host}
sql:update
The
hitlog
t able has t he following useful propert ies:
•
Access t im es are recorded aut om at ically in t he
TIMESTAMP
colum n
t
w hen you insert new records.
•
By linking t he
path
colum n t o an
AUTO_INCREMENT
colum n
hits
, t he count er values for a given page pat h increm ent aut om at ically whenever you insert a
new record for t hat pat h. The count ers are m aint ained separat ely for each dist inct
path
value. For m ore inform at ion on how m ult iple-colum n sequences work, see Recipe 11.15
.
•
Theres no need t o check whet her t he count er for a page already exist s, because you insert a new row each t im e you record a hit for a page, not j ust for t he first hit .
•
I f you want t o det erm ine t he current count ers for each page, select t he record for each dist inct
path
value t hat has t he largest
hits
value: SELECT path, MAXhits FROM hitlog GROUP BY path;
18.15 Using MySQL for Apache Logging
18.15.1 Problem
You dont want t o use MySQL t o log accesses for j ust a few pages, as shown in Recipe 18.14
. You want t o log all pages accesses, and you dont want t o have t o put logging act ions in each
page explicit ly.
18.15.2 Solution
Tell Apache t o log pages accesses t o MySQL.
18.15.3 Discussion
The uses for MySQL in a web cont ext arent lim it ed j ust t o page generat ion and processing. You can use it t o help you run t he web server it self. For exam ple, m ost Apache servers are set
up t o log a record of web request s t o a file. But it s also possible t o send log records t o a
program inst ead, from which you can writ e t he records wherever you like—such as t o a dat abase. Wit h log records in a dat abase rat her t han a flat file, t he log becom es m ore highly
st ruct ured and you can apply SQL analysis t echniques t o it . Log file analysis t ools m ay be writ t en t o provide som e flexibilit y, but oft en t his is a m at t er of deciding which sum m aries t o
display and which t o suppress. I t s m ore difficult t o t ell a t ool t o display inform at ion it wasnt built t o provide. Wit h log ent ries in a t able, you gain addit ional flexibilit y. Want t o see a
part icular report ? Writ e t he SQL st at em ent s t hat produce it . To display t he report in a specific form at , issue t he queries from wit hin an API and t ake advant age of your languages out put
product ion capabilit ies. By handling log ent ry generat ion and st orage using separat e processes, you gain som e
addit ional flexibilit y. Som e of t he possibilit ies are t o send logs from m ult iple web servers t o t he sam e MySQL server, or t o send different logs generat ed by a given web server t o different
MySQL servers. This sect ion shows how t o set up web request logging from Apache int o MySQL and
dem onst rat es som e sum m ary queries you m ay find useful.
18.15.4 Setting Up Database Logging
Apache logging is cont rolled by direct ives in t he ht t pd.conf configurat ion file. For exam ple, a t ypical logging set up uses
LogFormat
and
CustomLog
direct ives t hat look like t his: LogFormat h l u t \r\ s b common
CustomLog usrlocalapachelogsaccess_log common The
LogFormat
line defines a form at for log records and gives it t he nicknam e
common
. The
CustomLog
direct ive indicat es t hat lines should be writ t en in t hat form at t o t he access_log file in Apaches logs direct ory. To set up logging t o MySQL inst ead, use t he
following procedure:
[4] [4]
If youre using logging directives such as
TransferLog
rather than
LogFormat
and
CustomLog
, youll need to adapt the instructions in this section.
•
Decide what values you want t o record and set up a t able t hat cont ains t he appropriat e colum ns.
•
Writ e a program t o read log lines from Apache and writ e t hem int o t he dat abase.
•
Set up a
LogFormat
line t hat defines how t o writ e log lines in t he form at t he program expect s, and a
CustomLog
direct ive t hat t ells Apache t o writ e t o t he program rat her t han t o a file.
Suppose you want t o record t he dat e and t im e of each request , t he host t hat issued t he request , t he request m et hod and URL pat hnam e, t he st at us code, t he num ber of byt es
t ransferred, and t he user agent t ypically a browser or spider nam e . A t able t hat includes colum ns for t hese values can be creat ed as follow s:
CREATE TABLE httpdlog dt DATETIME NOT NULL, request date
host VARCHAR255 NOT NULL, client host method VARCHAR4 NOT NULL, request method GET, PUT,
etc. url VARCHAR255 BINARY NOT NULL, URL path
status INT NOT NULL, request status size INT, number of bytes transferred
agent VARCHAR255 user agent ;
Most of t he st ring colum ns use
VARCHAR
and are not case sensit ive. The except ion,
url
, is declared as a binary st ring as is appropriat e for a server running on a syst em wit h case-
sensit ive filenam es. I f youre using a server where URL let t ercase doesnt m at t er, you can om it t he word
BINARY
. The
httpdlog
t able definit ion shown here doesnt include any indexes. You should add som e, because ot herwise any sum m ary queries you run will slow down dram at ically as t he
t able becom es large. The choice of which colum ns t o index will be based on t he t ypes of queries you int end t o run t o analyze t he t able cont ent s. For exam ple, queries t o analyze t he
dist ribut ion of client host values will benefit from an index on t he
host
colum n. Next , you need a program t o process log lines produced by Apache and insert t hem int o t he
httpdlog
t able. The following script , ht t pdlog.pl, opens a connect ion t o t he MySQL server, t hen loops t o read input lines. I t parses each line int o colum n values and insert s t he result int o
t he dat abase. When Apache exit s, it closes t he pipe t o t he logging program . That causes ht t pdlog.pl t o see end of file on it s input , t erm inat e t he loop, disconnect from MySQL, and exit .
usrbinperl -w httpdlog.pl - Log Apache requests to httpdlog table
use strict; use lib qwusrlocalapachelibperl;
use Cookbook; my dbh = Cookbook::connect ;
my sth = dbh-prepare qq{ INSERT INTO httpdlog dt,host,method,url,status,size,agent
VALUES ?,?,?,?,?,?,? };
while loop reading input {
chomp; my dt, host, method, url, status, size, agent
= split \t, _; map - to NULL for some columns
size = undef if size eq -; agent = undef if agent eq -;
sth-execute dt, host, method, url, status, size, agent; }
dbh-disconnect ; exit 0;
I nst all t he ht t pdlog.pl script where you want Apache t o look for it . On m y syst em , t he Apache root direct ory is usr local apache, so usr local apache bin is a reasonable inst allat ion
direct ory. The pat h t o t his direct ory will be needed short ly for const ruct ing t he
CustomLog
direct ive t hat inst ruct s Apache t o log t o t he script . ht t pdlog.pl assum es t hat input lines cont ain
httpdlog
colum n values delim it ed by t abs t o m ake it easy t o break apart input lines , so Apache m ust writ e log ent ries in a m at ching
form at . The
LogFormat
field specifiers t o produce t he appropriat e values are as follows:
{Y-m-d H:M:S}
The dat e and t im e of t he request , in MySQLs
DATETIME
form at h
The host from which t he request originat ed m
The request m et hod
GET
,
POST
, and so fort h U
The URL pat h s
The st at us code b
The num ber of byt es t ransferred { User-A gent} i
The user agent
To define a logging form at nam ed
mysql
t hat produces t hese values wit h t abs in bet ween, add t he following
LogFormat
direct ive t o your ht t pd.conf file: LogFormat {Y-m-d H:M:S}t\th\tm\tU\ts\tb\t{User-Agent}i
mysql Most of t he pieces are in place now. We have a log t able, a program t hat writ es t o it , and a
mysql
form at for producing log ent ries. All t hat rem ains is t o t ell Apache t o writ e t he ent ries t o t he ht t pdlog.pl script . However, unt il you know t hat t he out put form at really is correct and
t hat t he program can process log ent ries properly, it s prem at ure t o t ell Apache t o log direct ly t o t he program . To m ake t est ing and debugging a bit easier, have Apache log
mysql
- form at ent ries t o a file inst ead. That way, you can look at t he file t o check t he out put form at , and you
can use it as input t o ht t pdlog.pl t o verify t hat t he program works correct ly. To inst ruct Apache t o log lines in
mysql
form at t o t he file t est _log in Apaches log direct ory, use t his
CustomLog
dir ect ive: CustomLog usrlocalapachelogstest_log mysql
Then rest art Apache t o enable t he new logging direct ives. Aft er your web server receives a few request s, t ake a look at t he t est _log file. Verify t hat t he cont ent s are as you expect , t hen feed
t he file t o ht t pdlog.pl. I f youre in Apaches logs direct ory and t he bin and logs direct ories are bot h under t he Apache root , t he com m and looks like t his:
..binhttpdlog.pl test_log
Aft er ht t pdlog.pl finishes, t ake a look at t he
httpdlog
t able t o m ake sure t hat it looks correct . Once youre sat isfied, t ell Apache t o send log ent ries direct ly t o ht t pdlog.pl by
m odifying t he
CustomLog
direct ive as follow s: CustomLog |usrlocalapachebinhttpdlog.pl mysql
The
|
charact er at t he beginning of t he pat hnam e t ells Apache t hat ht t pdlog.pl is a program , not a file. Rest art Apache and new ent ries should appear in t he
httpdlog
t able as visit ors request pages from your sit e.
Not hing you have done t o t his point changes any logging you m ay have been doing originally. For exam ple, if you were logging t o an access_log file before, you st ill are now. Thus, Apache
will be sending ent ries bot h t o t he original log file and t o MySQL. I f t hat s what you want , fine. Apache doesnt care if you log t o m ult iple dest inat ions. But youll use m ore disk space if you
do. To disable file logging, com m ent out your original
CustomLog
direct ive by placing a charact er in front of it , t hen rest art Apache.
18.15.5 Analyzing the Log File