Analyzing the Log File

To define a logging form at nam ed mysql t hat produces t hese values wit h t abs in bet ween, add t he following LogFormat direct ive t o your ht t pd.conf file: LogFormat {Y-m-d H:M:S}t\th\tm\tU\ts\tb\t{User-Agent}i mysql Most of t he pieces are in place now. We have a log t able, a program t hat writ es t o it , and a mysql form at for producing log ent ries. All t hat rem ains is t o t ell Apache t o writ e t he ent ries t o t he ht t pdlog.pl script . However, unt il you know t hat t he out put form at really is correct and t hat t he program can process log ent ries properly, it s prem at ure t o t ell Apache t o log direct ly t o t he program . To m ake t est ing and debugging a bit easier, have Apache log mysql - form at ent ries t o a file inst ead. That way, you can look at t he file t o check t he out put form at , and you can use it as input t o ht t pdlog.pl t o verify t hat t he program works correct ly. To inst ruct Apache t o log lines in mysql form at t o t he file t est _log in Apaches log direct ory, use t his CustomLog dir ect ive: CustomLog usrlocalapachelogstest_log mysql Then rest art Apache t o enable t he new logging direct ives. Aft er your web server receives a few request s, t ake a look at t he t est _log file. Verify t hat t he cont ent s are as you expect , t hen feed t he file t o ht t pdlog.pl. I f youre in Apaches logs direct ory and t he bin and logs direct ories are bot h under t he Apache root , t he com m and looks like t his: ..binhttpdlog.pl test_log Aft er ht t pdlog.pl finishes, t ake a look at t he httpdlog t able t o m ake sure t hat it looks correct . Once youre sat isfied, t ell Apache t o send log ent ries direct ly t o ht t pdlog.pl by m odifying t he CustomLog direct ive as follow s: CustomLog |usrlocalapachebinhttpdlog.pl mysql The | charact er at t he beginning of t he pat hnam e t ells Apache t hat ht t pdlog.pl is a program , not a file. Rest art Apache and new ent ries should appear in t he httpdlog t able as visit ors request pages from your sit e. Not hing you have done t o t his point changes any logging you m ay have been doing originally. For exam ple, if you were logging t o an access_log file before, you st ill are now. Thus, Apache will be sending ent ries bot h t o t he original log file and t o MySQL. I f t hat s what you want , fine. Apache doesnt care if you log t o m ult iple dest inat ions. But youll use m ore disk space if you do. To disable file logging, com m ent out your original CustomLog direct ive by placing a charact er in front of it , t hen rest art Apache.

18.15.5 Analyzing the Log File

Now t hat you have Apache logging int o t he dat abase, what can you do wit h t he inform at ion? That depends on what you want t o know. Here are som e exam ples t hat show t he kinds of quest ions you can use MySQL t o answer easily: • How m any records are in t he request log? SELECT COUNT FROM httpdlog; • How m any different client host s have sent request s? SELECT COUNTDISTINCT host FROM httpdlog; • How m any different pages have client s request ed? SELECT COUNTDISTINCT url FROM httpdlog; • What are t he t en m ost popular pages? • SELECT url, COUNT AS count FROM httpdlog GROUP BY url ORDER BY count DESC LIMIT 10; • How m any request s have been received for t hose useless, wret ched favicon.ico files t hat cert ain browsers like t o check for? SELECT COUNT FROM httpdlog WHERE url LIKE favicon.ico; • What is t he range of dat es spanned by t he log? SELECT MINdt, MAXdt FROM httpdlog; • How m any request s have been received each day? SELECT FROM_DAYSTO_DAYSdt AS day, COUNT FROM httpdlog GROUP BY day; Answering t his quest ion requires st ripping off t he t im e- of- day part from t he dt values so t hat request s received on a given dat e can be grouped. The query does t his using TO_DAYS and FROM_DAYS t o convert DATETIME values t o DATE values. However, if you int end t o run a lot of queries t hat use j ust t he dat e part of t he dt values, it w ould be m ore efficient t o creat e t he httpdlog t able w it h separat e DATE and TIME colum ns, change t he LogFormat direct ive t o produce t he dat e and t im e as separat e out put values, and m odify ht t pdlog.pl accordingly. Then you can operat e on t he request dat es direct ly wit hout st ripping off t he t im e, and you can index t he dat e colum n for even bet t er perform ance. • What is t he hour-of- t he- day request hist ogram ? SELECT HOURdt AS hour, COUNT FROM httpdlog GROUP BY hour; • What is t he average num ber of request s received each day? SELECT COUNTTO_DAYSMAXdt - TO_DAYSMINdt + 1 FROM httpdlog; The num erat or is t he t ot al num ber of request s in t he t able. The denom inat or is t he num ber of days spanned by t he records. • What is t he longest URL recorded in t he t able? SELECT MAXLENGTHurl FROM httpdlog; I f t he url colum n is defined as VARCHAR255 and t his query produces a value of 255, it s likely t hat som e URL values were t oo long t o fit in t he colum n and were t runcat ed at t he end. To avoid t his, you can convert t he colum n t o BLOB or TEXT depending on whet her or not you want t he values t o be case sensit ive . For exam ple, if you want case- sensit ive values up t o 65,535 charact ers long, m odify t he url colum n as follows: ALTER TABLE httpdlog MODIFY url BLOB NOT NULL; • What is t he t ot al num ber of byt es served and t he average byt es per request ? • SELECT • COUNTsize AS requests, • SUMsize AS bytes, • AVGsize AS bytesrequest FROM httpdlog; The query uses COUNTsize rat her t han COUNT t o count only t hose request s wit h a non- NULL size value. I f a client request s a page t wice, t he server m ay respond t o t he second request by sending a header indicat ing t hat t he page hasnt changed rat her t han by sending cont ent . I n t his case, t he log ent ry for t he request will have NULL in t he size colum n. • How m uch t raffic has t here been for each kind of file based on filenam e ext ension such as .ht m l, .j pg, or .php ? • SELECT • SUBSTRING_INDEXSUBSTRING_INDEXurl,?,1,.,-1 AS extension, • COUNTsize AS requests, • SUMsize AS bytes, • AVGsize AS bytesrequest • FROM httpdlog • WHERE url LIKE . GROUP BY extension; The WHERE clause select s only url values t hat have a period in t hem , t o elim inat e pat hnam es t hat nam e files t hat have no ext ension. To ext ract t he ext ension values for t he out put colum n list , t he inner SUBSTRING_INDEX call st rips off any param et er st ring at t he right end of t he URL and leaves t he rest . This t urns a value like cgi-binscript.pl?id=43 int o cgi-binscript.pl . I f t he value has no param et er part , SUBSTRING_INDEX ret urns t he ent ire st ring. The out er SUBSTRING_INDEX call st rips everyt hing up t o and including t he right m ost period from t he result , leaving only t he ext ension.

18.15.6 Other Logging Issues