Antilog |
:: Email ::
Download area
|
Antilog is a module to read and query Apache log files. The current version is not very sophisticated (yet), but can grow over time. One possible addition could be, reading a series of log files, and presenting the data much like it does now, but for multiple days. I'm now sure how efficient it will be with memory... Usage is simple. Here's a quick example: Create an instance of RefLogReader: >>> import antilog
>>> reflog = antilog.RefLogReader('access.log')
How many records did it read? >>> len(reflog.data) 8033 What are the top 5 files requested? >>> d = reflog.get_top_n('url', 5)
>>> reflog.pprint(d)
1281 /weblog/totm.rss
854 /snakeheader.jpg
620 /images/sponsorme.png
561 /weblog/new.css
518 /weblog/valid-rss-bbulger.png
(Two important methods here: In much the same way, we can get the top 5 referrers: >>> d = reflog.get_top_n('referrer', 5)
>>> reflog.pprint(d)
554 http://www.zephyrfalcon.org/weblog/
542 http://zephyrfalcon.org/weblog/arch_d7_2003_11_22.html
374 http://zephyrfalcon.org/weblog/arch_d7_2003_11_29.html
359 http://zephyrfalcon.org/weblog/
185 http://zephyrfalcon.org/weblog/index.html
However, this includes "referrals" from my own site... probably not what I want to see. To fix this, I can pass a function that filters unwanted URLs: >>> def isnotlocal(url):
... if url.startswith("http://"):
... url = url[7:]
... return not (url.startswith("zephyrfalcon.org") \
... or url.startswith("www.zephyrfalcon.org"))
...
>>> d = reflog.get_top_n('referrer', 5, filterfunc=isnotlocal)
>>> reflog.pprint(d)
124 http://www.pythonware.com/daily/
30 http://angra3594.fc2web.com/index.html
29 http://www.cafeconleche.org/
20 http://www.ibiblio.org/xml/
17 http://www.google.com/search?q=torrents&hl=en&lr=&ie=UTF-8...
Much better. Also note the >>> reflog.get_unique_values('return_code')
[('200', 6134), ('206', 13), ('301', 5), ('304', 1706), ('404', 167),
('403', 1), ('401', 7)]
6134 requests were met with response code 200. 167 yielded a 404, etc. Note that this information can be used to create our own result set to pass to >>> invalid = [e for e in reflog.data if e.return_code.startswith('4')]
>>> len(invalid)
175
We now have a list of invalid requests. Let's pass it to >>> d = reflog.get_top_n('url', 5, entries=invalid)
>>> reflog.pprint(d)
144 /favicon.ico
6 /stats/
3 /weblog/arch_d7_2003_11_15.htm
3 /download/)
2 /weblog/arch_d7_2003_11_22.htm
Most of the invalid requests were for Despite its limitations, I find antilog quite useful (as far as inspecting a daily log file goes). Suggestions for more useful methods are always welcome. |