The clicksort .php script shown here does not do t hat . However, t he
recipes
dist ribut ion cont ains a Perl count erpart script , clicksort .pl, t hat does perform t his kind of check. Have a
look at it if you want m ore inform at ion. The cells in t he rows following t he header row cont ain t he dat a values from t he dat abase
t able, displayed as st at ic t ext . Em pt y cells are displayed using
nbsp;
so t hat t hey display w it h t he sam e border as nonem pt y cells see
Recipe 17.4 .
18.13 Web Page Access Counting
18.13.1 Problem
You want t o count t he num ber of t im es a page has been accessed. This can be used t o display a hit count er in t he page. The sam e t echnique can be used t o record ot her t ypes of
inform at ion as well, such as t he num ber of t im es each of a set of banner ads has been served.
18.13.2 Solution
I m plem ent a hit count er, keyed t o t he page you want t o count .
18.13.3 Discussion
This sect ion discusses access count ing, using hit count ers for t he exam ples. Count ers t hat display t he num ber of t im es a web page has been accessed are not such a big t hing as t hey
used t o be, presum ably because page aut hors now realize t hat m ost visit ors dont really care how popular a page is. St ill, t he general concept has applicat ion in several cont ext s. For
exam ple, if youre displaying banner ads in your pages Recipe 17.8
, you m ay be charging vendors by t he num ber of t im es you serve t heir ads. To do so, you need t o count t he num ber
of accesses for each one. You can adapt t he t echnique shown in t his sect ion for purposes such as t hese.
Ther e ar e sever al m et hods for writ ing a page t hat displays a count of t he num ber of t im es it has been accessed. The m ost basic is t o m aint ain t he count in a file. When t he page is
request ed, you open t he file, read t he count , increm ent it and writ e t he new count back t o t he file and display it in t he page. This has t he advant age of being easy t o im plem ent and t he
disadvant age t hat it requires a count er file for each page t hat includes a hit count . I t also doesnt work properly if t wo client s access t he page at t he sam e t im e, unless you im plem ent
som e kind of locking prot ocol in t he file access procedure. I t s possible t o reduce count er file lit t er by keeping m ult iple count s in a single file, but t hat m akes it m ore difficult t o access
part icular values wit hin t he file, and it doesnt solve t he sim ult aneous-access problem . I n fact , it m akes it worse, because a m ult iple-count er file has a higher likelihood of being accessed by
m ult iple client s sim ult aneously t han does a single- count er file. So you end up im plem ent ing st orage and ret rieval m et hods for processing t he file cont ent s, and locking prot ocols t o keep
m ult iple processes from int erfering wit h each ot her. Hm m . . . t hose sound suspiciously like
t he problem s t hat MySQL already t akes care of Keeping t he count s in t he dat abase cent ralizes t hem int o a single t able, SQL provides t he st orage and ret rieval int erface, and t he
locking problem goes away because MySQL serializes access t o t he t able so t hat client s cant int erfere wit h each ot her. Furt herm ore, depending on how you m anage t he count ers, you m ay
be able t o updat e t he count er and ret rieve t he new sequence value using a single query. I ll assum e t hat you want t o log hit s for m ore t han one page. To do t hat , creat e a t able t hat
has one row for each page t o be count ed. This m eans it s necessary t o have a unique ident ifier for each page, so t hat count ers for different pages dont get m ixed up. You could assign
ident ifiers som ehow, but it s easier j ust t o use t he pages pat h wit hin your web t ree. Web program m ing languages t ypically m ake t his pat h easy t o obt ain; in fact , weve already
discussed how t o do so in Recipe 18.2
. On t hat basis, you can creat e a
hitcount
t able as follows:
CREATE TABLE hitcount path VARCHAR255 BINARY NOT NULL,
hits BIGINT UNSIGNED NOT NULL, PRIMARY KEY path
;
This t able definit ion involves som e assum pt ions:
•
The
BINARY
keyword in t he
path
colum n definit ion m akes t he colum n values case sensit ive. That s appropriat e for a web plat form where pat hnam es are case sensit ive,
such as m ost versions of Unix. For Windows or for HFS+ filesyst em s under Mac OS X, filenam es are not case sensit ive, so youd om it
BINARY
from t he definit ion.
•
The
path
colum n has a m axim um lengt h of 255 charact ers, which lim it s you t o page pat hs no longer t han t hat . I f you expect t o require longer values, use a
BLOB
or
TEXT
t ype rat her t han
VARCHAR
. But in t his case, youre st ill lim it ed t o indexing a m axim um of t he left m ost 255 charact ers of t he colum n values, so youd use a non-
unique index rat her t han a
PRIMARY KEY
.
•
The m echanism works for a single docum ent t ree, such as when your web server is used t o serve pages for a single dom ain. I f you inst it ut e a hit count m echanism on a
host t hat servers m ult iple virt ual dom ains, you m ay want t o add a colum n for t he dom ain nam e. This value is available in t he
SERVER_NAME
value t hat Apache put s int o your script s environm ent . I n t his case, t he
hitcount
t able index would include bot h t he host nam e and t he page pat h.
The general logic involved in hit count er m aint enance is t o increm ent t he
hits
colum n of t he record for a page, t hen ret rieve t he updat ed count er value. One way t o do t hat is by using t he
following t wo queries:
UPDATE hitcount SET hits = hits + 1 WHERE path =
page path
; SELECT hits FROM hitcount WHERE path =
page path
; Unfort unat ely, if you use t hat approach, you m ay oft en not get t he correct value. I f several
client s request t he sam e page sim ult aneously, several
UPDATE
st at em ent s m ay be issued in close t em poral proxim it y. The following
SELECT
st at em ent s t hen wouldnt necessarily get t he corresponding
hits
value. This can be avoided by using a t ransact ion or by locking t he
hitcount
t able, but t hat slows down hit count ing. MySQL provides a solut ion t hat allows each client t o ret rieve it s own count , no m at t er how m any updat es happen at t he sam e t im e:
UPDATE hitcount SET hits = LAST_INSERT_IDhits+1 WHERE path =
page path
; SELECT LAST_INSERT_ID ;
The basis for updat ing t he count here is
LAST_INSERT_ID expr
, w hich w as discussed in
Recipe 11.17 . The
UPDATE
st at em ent finds t he relevant record and increm ent s it s count er value. The use of
LAST_INSERT_IDhits+1
r at her t han j ust
hits+1
t ells MySQL t o t reat t he value as t hough it were an
AUTO_INCREMENT
value. This allows it t o be ret rieved in t he second query using
LAST_INSERT_ID
. The
LAST_INSERT_ID
funct ion ret urns a connect ion- specific value, so you always get back t he value corresponding t o t he
UPDATE
issued on t he sam e connect ion. I n addit ion, t he
SELECT
st at em ent doesnt need t o query a t able, so it s very fast . A furt her efficiency m ay be gained by elim inat ing t he
SELECT
query alt oget her, which is possible if your API provides a m eans for direct ret rieval of t he m ost recent sequence num ber. For exam ple, in
Perl, you can updat e t he count and get t he new value wit h a single query like t his: dbh-do
UPDATE hitcount SET hits = LAST_INSERT_IDhits+1 WHERE path = ?, undef, page_path;
hits = dbh-{mysql_insertid};
However, t heres st ill a problem here. What if t he page isnt list ed in t he
hitcount
t able? I n t hat case, t he
UPDATE
st at em ent finds no record t o m odify and you get a count er value of zero. You could deal wit h t his problem by requiring t hat any page t hat includes a hit count er
m ust be regist ered in t he
hitcount
t able before t he page goes online. A friendlier alt ernat e approach is t o creat e a count er record aut om at ically for any page t hat is found not
t o have one. That way, page designers can put count ers in pages wit h no advance preparat ion. To m ake t he count er m echanism easier t o use, put t he code in a ut ilit y funct ion
t hat t akes a page pat h as it s argum ent , handles t he m issing- record logic int ernally, and ret urns t he count . Concept ually, t he funct ion act s like t his:
update the counter if the update modifies a row
retrieve the new counter value else
insert a record for the page with the count set to 1
The first t im e you request a count for a page, t he updat e m odifies no rows because t he page wont be list ed in t he t able yet . The funct ion creat es a new count er and ret urns a value of one.
For each request t hereaft er, t he updat e m odifies t he exist ing record for t he page and t he funct ion ret urns successive access count s.
I n Per l, a hit - count ing funct ion m ight look like t his, where t he argum ent s are a dat abase handle and t he page pat h:
sub get_hit_count {
my dbh, page_path = _; my rows = dbh-do
UPDATE hitcount SET hits = LAST_INSERT_IDhits+1 WHERE path = ?, undef, page_path;
return dbh-{mysql_insertid} if rows 0; counter was incremented If the page path wasnt listed in the table, register it and
initialize the count to one. Use IGNORE in case another client tries same thing at the same time.
dbh-do INSERT IGNORE INTO hitcount path,hits VALUES?,1, undef, page_path;
return 1; }
The CGI .pm
script_name
funct ion ret urns t he local part of t he URL, so you use
get_hit_count
like t his: my hits = get_hit_count dbh, script_name ;
print p This page has been accessed hits times.; The count ing m echanism pot ent ially involves m ult iple queries, and we havent used a
t ransact ional approach, so t he algorit hm st ill has a race condit ion t hat can occur for t he first access t o a page. I f m ult iple client s sim ult aneously request a page t hat is not yet list ed in t he
hitcount
t able, each of t hem m ay issue t he
UPDATE
query, find t he page m issing, and as a result issue t he
INSERT
query t o regist er t he page and init ialize t he count er. The algorit hm uses
INSERT IGNORE
t o suppress errors if sim ult aneous invocat ions of t he script at t em pt t o init ialize t he count er for t he sam e page, but t he result is t hat t heyll all get a
count of one. I s it wort h t rying t o fix t his problem by using t ransact ions or t able locking? For hit count ing, I d say no. The slight loss of accuracy doesnt warrant t he addit ional processing
overhead. For a different applicat ion, t he priorit y m ay be accuracy over efficiency, in which case you would opt for t ransact ions t o avoid losing a count .
A PHP version of t he hit count er looks like t his: function get_hit_count conn_id, page_path
{ query = sprintf UPDATE hitcount SET hits = LAST_INSERT_IDhits+1
WHERE path = s, sql_quote page_path; if mysql_query query, conn_id mysql_affected_rows conn_id
return mysql_insert_id conn_id; If the page path wasnt listed in the table, register it and
initialize the count to one. Use IGNORE in case another client tries same thing at the same time.
query = sprintf INSERT IGNORE INTO hitcount path,hits VALUESs,1, sql_quote page_path;
mysql_query query, conn_id; return 1;
}
To use it , call t he
get_self_path
funct ion t hat ret urns t he script pat hnam e see Recipe 18.2
: self_path = get_self_path ;
hits = get_hit_count conn_id, self_path; print pThis page has been accessed hits times.p\n;
I n Pyt hon, t he funct ion looks like t his: def get_hit_count conn, page_path:
cursor = conn.cursor cursor.execute
UPDATE hitcount SET hits = LAST_INSERT_IDhits+1 WHERE path = s
, page_path, if cursor.rowcount 0: a counter was incremented
count = cursor.insert_id cursor.close
return count If the page path isnt listed in the table, register it and
initialize the count to one. Use IGNORE in case another client tries same thing at the same time.
cursor.execute INSERT IGNORE INTO hitcount path,hits VALUESs,1
, page_path, cursor.close
return 1
And is used as follows: self_path = os.environ[SCRIPT_NAME]
count = get_hit_count conn, self_path print pThis page has been accessed d times.p count
The
recipes
dist ribut ion includes dem onst rat ion script s hit count er script s for Perl, PHP, and Pyt hon under t he apache direct ory. A JSP version is under t he t om cat direct ory. I nst all
any of t hese in your web t ree, invoke it a few t im es, and wat ch t he count increase. First youll need t o creat e t he
hitcount
t able, as well as t he
hitlog
t able described in Recipe
18.14 . Bot h t ables can be creat ed from t he hit s.sql script provided in t he t ables direct ory.
18.14 Web Page Access Logging