how to implement a lire dlf and report archive
Joost van Baal
joostvb at logreport.org
Sun May 27 18:22:28 CEST 2001
Hi,
I'm thinking about how to implement a lire dlf and report archive. (One
might even call it a datawarehouse, for extra buzzword bingo fun.)
The current ideas, after some discussion between me and Egon, are in
the TODO file. (You can see the latest one on
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/logreport/service/doc/TODO?rev=1.146&content-type=text/plain ).
The ideas are still somewhat unpolished. If you have some ideas about
it, please give them. I'm planning to start implementing the ideas the coming
week. The current blurb in the TODO file about the archive is:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The archive should store files in .xml and .dlf format. It shouls reside
somewhere under /var/lib/lire/data. (The current lire .deb creates·
/var/lib/lire .)
The variable KEEP still is used to decide wether tmpfiles are kept. The to
be introduced variable ARCHIVE indicates wether files should get archived. If
set, files are moved from TMPDIR to the archive. So, two kind of files are
in consideration: those which are candidates for archiving (depending on
ARCHIVE or KEEP) and those which will never get stored in the archive (kept
in TMPDIR depending on KEEP)
file is variable variable file is
candidate KEEP ARCHIVE kept in
for is is
archive
yes set set archive
yes set unset archive
yes unset set archive
yes unset unset /dev/null
no set set TMPDIR
no set unset TMPDIR
no unset set /dev/null
no unset unset /dev/null
Per kept file, we wanna be able to find out:
- filename
- service
- superservice
- timerange
- subject/hostname/fromaddress (maybe even complete mailheaders
of email message which contained the logfile)
- some external id (e.g. hostname, to be able to merge different reorts
which report on the same thing)
- format (xml, log, report, or maybe even something else)
We use an 'LR_ID' to identify a job for the lire system, i.e. a received
email message or local logfile.
We use a 'REPORT_ID' to identify a report. One logfile could get split in
parts about e.g. different days. For each day, a separate report could get
generated. Other ways to split are possible (e.g. for logfiles which carry
lines about different hosts or even services.)
Perhaps it's wise to include an LR_ID in the generated report.
We could store meta information in an index file (e.g.
/var/lib/lire/data/meta/index), which could look like:
LR_ID-9871614364-1456 subject gelfand test
LR_ID-9871614364-1456 service email
LR_ID-9871614364-1456 time 2001050427
REPORT_ID-987161443426-234 time 20010527-20010528
REPORT_ID-98716144999-234 time 200105270104-200105282359
REPORT_ID-98716144988-261 time 200105
That is: idtag space key space value-with-possibly-embedded-spaces .
Perhaps we should think of some relational database model, and implement it
accordingly.
time ranges should be UTC, in "allmost human readable format":
yyyymm[dd[hh[mm[ss]]]][-yyyymm[dd[hh[mm[ss]]]]]
The directorylayout could be:
subservice (sub)reporttype
/var/lib/lire/data/report/xml/email/postfix/all/complete/extid/20010527-20010528
/var/lib/lire/data/report/html/
/var/lib/lire/data/report/ascii/
/var/lib/lire/data/email/raw/
/var/lib/lire/data/email/plain/
/var/lib/lire/data/log/dlf/www/apache/common/viewtype/extid/200105
^^^^^^^^
where should different 'views' go? and filtered logs? E.g., currently we
have 'filter' and 'filter_messages' for email. The are filters from dlf to
dlf.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Bye,
Joost
--
Joost van Baal . . http://www.logreport.org/
. .
/^LogReport$/ . . joostvb at logreport.org
--
To UNSUBSCRIBE, email to development-request at logreport.org with a subject of
"unsubscribe". Trouble? Send an email with subject "help" to
development-request at logreport.org
More information about the Development
mailing list