how to ingrate xml support in lire

Joost van Baal joostvb at logreport.org
Thu Mar 15 20:47:36 CET 2001


Hi,

There's some discussion going on about the way lire (formerly know
as 'lr') should support generating it's data in xml format.
I've got some ideas about how this should be done, and like to
state them here.  Any comments are greatly welcomed.

The "old" system works something like this:

logfile -> service2dlf -> dlffile -> 
             various small report scripts -> raw report

The small report scripts output stuff like e.g.:

title per day traffic summary: number of messages per status
title
format s t n t
data 47 13.example.com 2 /33.example.com
data 47 13.example.com 1 /~vanbaal/cv/?N=A
data 47 13.example.com 1 /cgi-bin/man2html?whatis+1
data 47 13.example.com 1 /cgi-bin/man2html?void+4

(It's saved as a .report.raw file.)

Then the system does:

raw report -> querycalc-tidy -> formatted report

(The situation in case the logs are anonimized is a bit 
different.)

The formatted report features stuff like:

requested pages per clienthost, top 30, top 5 pages

  crawler3.bos2.fast-search.net ...............     212
    /~vanbaal/doc/autotools/autobook-1.0/aut          1
    /~vanbaal/doc/autotools/autobook-1.0/a20          1
    /doc/tetex-nonfree/texmf/latex/latex2e-h          1
  rech172.insa-rennes.fr ......................      10
    /doc/texmf/help                                   2
    /icons/unknown.gif                                1
    /icons/text.gif                                   1

I believe it would be wisest to get rid of the raw report format.
This might better be replaced by an xml file, since this gives
us the possibility to support richer data structures.  We'll gain
flexibility by doing this.  If we choose to do this, all small
report scripts (
  report_delay
  report_error
  report_fromdomain
  report_fromrelay
  report_fromusersperfromdomain
  report_perday
  report_perhour
  report_size
  report_todomain
  report_torelay
  report_touserspertodomain
  report_bytesperday
  report_bytesperresult
  report_clienthost
  report_httpresult
  report_httpresultperclient
  report_pagesperclienthost
  report_requestedpage
  report_requestsperday
  report_compressionperpage
  report_compressionperfiletype
  report_result
I believe) need to be changed.  They should generate small xml files,
instead of the current raw report snippets.  

I don't think we should enforce the use of big xml parsers.  If 
people desire to stick with the simple ascii reports, they should be 
able to do so, without using big xml parsers.  I don't think a
solaris sysadmin would like to install lire, if she needed to 
install xalan or something similar next to it, in order to be 
able to use it.

We could write a parser for our own small xml files, which does 
_just_ enough to generate an ascii report.  (I have no clue about how 
difficult this is.)  A less nice solution would be to keep the 
current raw reports, and convert these to xml (I believe we have 
something like this in place now.)  When we choose to do this, we 
could keep our current querycalc-tidy.  However, the system will 
not gain in flexibility this way.

Is my story clear?  Any comments?

Bye,

-- 
Joost




-- 
To UNSUBSCRIBE, email to development-request at logreport.org with a subject of 
"unsubscribe". Trouble? Send an email with subject "help" to 
development-request at logreport.org



More information about the Development mailing list