RFC: LogReport Architecture

Joost van Baal joostvb at logreport.org
Fri Jun 22 21:28:05 CEST 2001


Hi,

On Fri, Jun 22, 2001 at 01:38:47AM -0400, Francis J. Lacoste wrote:
> 
> Tonight, I have reviewed carefully the current LogReport architecture
> and written a little report. The first part discuss the way I understand
> the current LogReport Architecture. 

That's excellent!  Couldn't have it written it more clear myself.  It
indeed could very well get included in the User Manual.

<snip>
> 
> One thing I haven't discussed is the current archiving problem. But I think
> it's somewhat orthogonal to the issue.

I'm working on that now.  Please note, however, that I'd still like
to ship a potato lire.deb before monday, so that Christoph can present it
with his TelemetryBox in Bordeaux.  So please try to be conservative
with commits this weekend.

> P.S. If you think that certain parts should be integrated in the User Manual
> or the Developer Manual, tell me and I can add them.
> 

> LogReport Architecture Discussion and Proposal
> 
> Architecture of LogReport
> -------------------------
> 
> LogReport generates various reports for various applications. Its
> processing is three tiered : log processing, report generation and
> report formatting.
> 
> Log Processing Architecture
> ---------------------------
> 
> In the log processing tier, a logfile for a specific application
> (Postfix, Apache, Exim, Bind8) is translated to a common log format
> for its class of application. This format is called the Distilled Log
> Format or DLF for short. Each application class has a common DLF
> format. For example, Postfix, Exim and Sendmail all use the email DLF
> format. Common class of applications are called "superservice" in
> LogReport. The currently defined superservices are dns, email, www and
> apachemodgzip.
> 
> LogReport transform one DLF log into one report. The service
> (application) specific logfile is transformed into DLF format through
> the appropriate lr_<service>2dlf script.

These scripts are generally called <service>2dlf .

> The DLF format is currently a text database where each record takes
> one line and the fields are separated by a space.
> 
> The schema of the format for a specific superservice is defined in a
> dlf.cfg file which associated field position with name. The schema is
> a shell script which defines environment variable.

There are still some unresolved issues with this.  Not all 2dlf
scripts use the description as in dlf.cfg.  Of course they should, since
otherwise the use of having such a description is very debatable. There's
also a dlf.default, to find out which values to use in case no sane
information can get extracted from the log.  These ideas are not yet
fully implemented.

> Report Generation Architecture
> ------------------------------
> 
> LogReport generates one report per superservice DLF file it process. A
> report contains several subreports where each describes a specific
> aspect of the superservice. For example, in the report for the email
> superservice, there is a "deliveries per to-domain, top 10" subreport
> that gives the top 10 domain to which mail was delivered. The
> generated report is in XML format which will be transformed to the
> appropriate output format (HTML, PDF, ASCII) in the Report Formatting
> tier.
> 
> The lr_dlf2xml program generates the report.

The lr_dlf2xml program generates the report, by cat-ing together
the various subreports.

> Each subreport is
> generated by an independant program. That program is run by lr_dlfxml
> and outputs the appropriate subreport XML element.

The independent programs read the dlf file, or, optionally, a filtered
dlf file.  This optional filtering is done, since very often different
report scripts are interested in the same dlf records.  In such cases,
this filtering needs to be done only once for each dlf file.

> Adding a subreport means creating a new program which will output the
> correct subreport and adding it to the list of program to be run by
> lr_dlf2xml. This is done through adding it to one of to the
> $REPORTFILTERS_<filter>_FILE or $REPORTSCRIPTS file. 

The $REPORTFILTERS_<filter>_FILE and $REPORTSCRIPTS variables are
set in /etc/lire/<service>/defaults.

E.g., in the email case:

 REPORTFILTERS="filter filter_messages"
 REPORTSCRIPTS_filter_messages_FILE=/etc/lire/email/reportscripts_filter_messages

/etc/lire/email/reportscripts_filter_messages looks like

 report_perday
 report_perhour
 report_size
 report_delay
 report_error

filter_messages is a script which filters an email dlf file by
throwing away redundant lines about same message.  This is exactly
the info needed by the scripts, mentioned in reportscripts_filter_messages.

In the www case:

 REPORTSCRIPTS_FILE=/etc/lire/www/reportscripts

/etc/lire/www/reportscripts looks like

 report_clienthost
 report_requestedpage
 report_pagesperclienthost
 ...

> 
> The subreport program can be written in perl or shell (most are shell
> scripts) and can use any tool available like sed, awk, sort, head,
> etc. There is querycalc program which can be used to process simple
> query on the DLF file and there is a lr_querycalc2xml which can be
> used to transform the querycalc output in a XML table element.
> 
> Customization of the Report Generation
> --------------------------------------
> 
> The report generated by lr_dlf2xml can be customized in different
> ways. First the order in which subreport are listed can be changed by
> mofifying the order of the command in $REPORTSCRIPTS_FILE (this is set
> in the defaults file specific to the superservice) and those of
> $REPORTFILTERS_<filter>_FILE. (But all filter subreports are always
> generated after the standard one.) To remove a specific subreport, you
> can comment out its name in the file.
> 
> Each subreport program can be customized through the use of command
> line switch (specified in the $REPORTSCRIPTS_FILE or
> $REPORT_FILTERS_<filter>_FILE) or through a <subreport>.conf file
> which will be sourced before executing the program.
> 
> Report Formatting Architecture
> ------------------------------
> 
> The XML report can be formatted in various format : PDF, HTML, ASCII.
> Additionnaly charts (pie, line or bar) can also be generated from it.
> 
> Limits of the Present Architecture
> ----------------------------------
> 
> I think that the biggest problem of the current architecture is in the
> report generation process. Adding a report and customizing report is
> very tedious both for the developer and for the user. The other two
> interfaces are pretty good and seems extensible. 
> 
> The problem is that report information is scattered in lots of file
> which are linked trough various indirection of environmental
> variables. Also, customization of the subreports aren't standardized.
> This means that writing a GUI to make it easier for the user to
> customize the report would be very tedious because there isn't any
> meta-information about the available subreports (what parameters can
> be modified, what are their meaning, etc.). Each modification to a
> subreport or addition of a subreport would mean changes to the GUI.

The problem here is, i believe, that it's currently next to impossible
to automatically generate a report_whatidliketoknowtoday script.

> Proposal of Modifications to the Report Generation Architecture
> ---------------------------------------------------------------
> 
> One solution would be to replace the subreport-is-a-program paradigm
> with the use of a report specification language. Like the XML report
> format that we presently have, the report specification language would
> be XML based. It would contains in a structured document the
> description of the various subreport and the description and type of
> the various parameters that can be customized. 
> 
> The lr_dlf2xml program would be replaced by a program which reads the
> XML report specification and process the DLF database accordingly to
> generate the XML report. All the information needed to understand a
> report would be self-contained in one file. (Of course, this file
> could be split up for convenience through the use of XML entities;
> reordering the report would only mean reordering the entities.)
> 
> This solution would make it easier to write new subreport, to develop
> a GUI to customize the report generation process and would also make
> it easier to localize. In this scheme, the xml-i18n-tool which can
> localize static XML files can be used to localize the report
> specification.
> 
> We should also note that modification of the report generation
> architecture is somewhat orthogonal to the use of SQL. We can easily
> imagine that the report specification processor uses SQL to build the
> report. More, if we want to support an environment where there is no
> need to install a full-fledge SQL server, we could write an processor
> for the same report specification which works with ASCII DLF file
> (altough it would be probably slower and more difficult to write).
> 
> There is some limits though. First, this is a lot less flexible than
> the current approach where we have a Turing-complete language to
> generate the subreport (the complete Shell or Perl language). Each new
> type of question that we add would necessitate modification to the
> report specification language (and accordingly to the processor). At
> any one time, only a subset of SQL would be used. It is my feelings
> however that the number of new type of questions that are asked
> (aggregate value by key, aggregate value by time key, etc) doesn't
> grow as fast as the number of subreport we have to write. So I think
> that the gain in expressivity, the reduction of complexity and the
> other advantages far outweight the loss in flexibility.

I agree.  In my Mon, 18 Jun 2001 15:37:01 +0200 message to this list,
I wrote:

 The questions we're currently answering are questions like:

 Give me for each value of field <fieldname> this value, and the number
 of dlf records where the value of field <fieldname> is this value,
 sorted by number.

 Give me for each value of field <fieldname_string> this value, and the
 sum of the values of field <fieldname_number>, sorted by these sums.

 Give me for each pair of fields <fieldname_string_a>,
 <fieldname_string_b>, the number of dlf records, sorted by number.

 Give me for each triple of fields <fieldname_string_a>,
 <fieldname_string_b>, <fieldname_string_c>, these values, along
 with the value of field <fieldname_integer>, sorted by the value of
 <fieldname_integer>.

I don't think we'll have to answer drastically different
kind of questions in the foreseeable future.


> Other Notes
> -----------
> 
> If we choose to go with something resembling that proposal, we will
> have to write that XML "language" and its processor using XML
> processing library like XML::Grove, XML::Parser, XML::DOM or XML::Sax.
> There are several perl template engine available but none really fit
> the need. Most are there to handle HTML template or tightly integrated
> with mod_perl or a web server environment. Those where the syntax can
> be extended (like HTML::Embperl 2.0beta) are still very alpha. In
> fact, all the templating system exist to mix *perl* code with fixed
> portion, whereas we don't want to mix *any* code in the XML.
> 
> Those could be useful if we decided to go midway : do not use a xml
> report specification but still drop the current scheme for an XML
> template with embedded SQL. This approach suffers for much of the same
> limitations of the previous one (meta-information is separated from
> subreport, more clutter, etc.)
> 
> Also this ties us to SQL which will make LogReport requires a full
> fledge SQL database for the SQL::Statement module (which is used for
> the DBD::CSV and DBD::File drivers) doesn't support aggregates (GROUP
> BY) nor functions (hour(timestamp).
> 

I take this for granted.  I am not really well introduced to the
various xml tools yet.

BTW, is there a perl module which makes a simple ascii file accessible
via sql statements?  I know there's a module which makes such a file
accessible as a database.  If there is such a thing, I'd like to
see it, and see it answering a simple sql statement.

As I said before, I'd very much like to keep lire portable.  Another
nice thing would be to keep supporting platforms with only very few
tools installed.  However, of course, if this turns out to be too
limiting, we can always reconsider this.

Bye,

Joost


-- 
To UNSUBSCRIBE, email to development-request at logreport.org with a subject of
"unsubscribe". Trouble? Send an email with subject "help" to
development-request at logreport.org



More information about the Development mailing list