how to ingrate xml support in lire
Joost van Baal
joostvb at logreport.org
Thu Mar 15 20:47:36 CET 2001
Hi,
There's some discussion going on about the way lire (formerly know
as 'lr') should support generating it's data in xml format.
I've got some ideas about how this should be done, and like to
state them here. Any comments are greatly welcomed.
The "old" system works something like this:
logfile -> service2dlf -> dlffile ->
various small report scripts -> raw report
The small report scripts output stuff like e.g.:
title per day traffic summary: number of messages per status
title
format s t n t
data 47 13.example.com 2 /33.example.com
data 47 13.example.com 1 /~vanbaal/cv/?N=A
data 47 13.example.com 1 /cgi-bin/man2html?whatis+1
data 47 13.example.com 1 /cgi-bin/man2html?void+4
(It's saved as a .report.raw file.)
Then the system does:
raw report -> querycalc-tidy -> formatted report
(The situation in case the logs are anonimized is a bit
different.)
The formatted report features stuff like:
requested pages per clienthost, top 30, top 5 pages
crawler3.bos2.fast-search.net ............... 212
/~vanbaal/doc/autotools/autobook-1.0/aut 1
/~vanbaal/doc/autotools/autobook-1.0/a20 1
/doc/tetex-nonfree/texmf/latex/latex2e-h 1
rech172.insa-rennes.fr ...................... 10
/doc/texmf/help 2
/icons/unknown.gif 1
/icons/text.gif 1
I believe it would be wisest to get rid of the raw report format.
This might better be replaced by an xml file, since this gives
us the possibility to support richer data structures. We'll gain
flexibility by doing this. If we choose to do this, all small
report scripts (
report_delay
report_error
report_fromdomain
report_fromrelay
report_fromusersperfromdomain
report_perday
report_perhour
report_size
report_todomain
report_torelay
report_touserspertodomain
report_bytesperday
report_bytesperresult
report_clienthost
report_httpresult
report_httpresultperclient
report_pagesperclienthost
report_requestedpage
report_requestsperday
report_compressionperpage
report_compressionperfiletype
report_result
I believe) need to be changed. They should generate small xml files,
instead of the current raw report snippets.
I don't think we should enforce the use of big xml parsers. If
people desire to stick with the simple ascii reports, they should be
able to do so, without using big xml parsers. I don't think a
solaris sysadmin would like to install lire, if she needed to
install xalan or something similar next to it, in order to be
able to use it.
We could write a parser for our own small xml files, which does
_just_ enough to generate an ascii report. (I have no clue about how
difficult this is.) A less nice solution would be to keep the
current raw reports, and convert these to xml (I believe we have
something like this in place now.) When we choose to do this, we
could keep our current querycalc-tidy. However, the system will
not gain in flexibility this way.
Is my story clear? Any comments?
Bye,
--
Joost
--
To UNSUBSCRIBE, email to development-request at logreport.org with a subject of
"unsubscribe". Trouble? Send an email with subject "help" to
development-request at logreport.org
More information about the Development
mailing list