new dlf query language: sql or xml/xsl?

Joost van Baal joostvb at logreport.org
Mon Jun 18 15:37:01 CEST 2001


On Thu, Jun 14, 2001 at 09:08:21AM -0400, Francis J. Lacoste wrote:
> 
> BTW, I thought a little about moving querycalc to SQL and I'm not
> sure it's a good idea. I've done much SQL (SQL based e-commerce
> application was the other big things at iNsu for which I developped
> Apache::iNcom, a web application framework) and SQL isn't a very 
> expressive reporting language. For example, one of the most important
> to do in reports is aggregates report (i.e. number of emails per hour, 
> per day of week, etc.). And this is very complicated to do in SQL. It
> takes on QUERY for every period because the language can't express 
> SELECT timestamp, sum(email_count) GROUP BY timestamp.hour . You have to
> do SELECT sum(email_count) WHERE timestamp < 01:00 and timestamp < 01:59, 
> and iterate for each period.

Currently, we use dlf_addtm(1) to handle this.  Once a dlf is extended
with these extra timestamps, I guess building an sql query will be much
simpler.

> I think we would be better to use a XML format (something like XSL 
> especially targeted at reporting). A program would be responsible 
> to turn the XML meta-report into a XML report. This could also helps
> with internationalization of the report. (Translate only the XML meta
> report, xml-i18n-tools could help for that.) If the XML meta-report 
> "language" is expressive enough, it would become very easy to add reports
> to system. 
> 
> For example, the previous example could become like
> 
> <aggregate-subreport start="00:00" end="24:00" period="1h">
>  <title>Email by <period/></title>
>  <aggregate-field>timestamp</aggregate-field>
>  <sum>email_count</sum>
>  <avg>email_size</sum> 
> </aggregate-subreport>
> 
> The attributes could also be specified at runtime. And I'm sure people
> will find it easier to learn that simple reporting language than SQL.

I don't know how difficult it would be to answer a query like the one
you give as an example.  (But I'm afraid it's far more difficult than
answering a query in sql syntax.)  Are there tools available for these
kind of things?

BTW, another issue is: currently we're reading the same dlf about as
many times as the number of subreports we're generating.  (This is not
exactly true: for email reports we have 'filter' and 'filter_messages',
see cvs-sourceforge/logreport/service/*/etc/reportscripts* ).

We should exploit these filters, and use them to gain performance.

The questions we're currently answering are questions like:

Give me for each value of field <fieldname> this value, and the number
of dlf records where the value of field <fieldname> is this value,
sorted by number.

Give me for each value of field <fieldname_string> this value, and the
sum of the values of field <fieldname_number>, sorted by these sums.

Give me for each pair of fields <fieldname_string_a>,
<fieldname_string_b>, the number of dlf records, sorted by number.

Give me for each triple of fields <fieldname_string_a>,
<fieldname_string_b>, <fieldname_string_c>, these values, along
with the value of field <fieldname_integer>, sorted by the value of
<fieldname_integer>.

Note that we currently create fields like 20010611 (yyyymmdd) and 23
(hh) for timerelated reports.

For email, after filtering the dlf, the occerence of a dlf record
indicates a _successful_ delivery.  (Before filtering, it means any
delivery attempt.)

When thinking about a different query language, it's useful to keep this
in mind.

Bye,

Joost


-- 
To UNSUBSCRIBE, email to development-request at logreport.org with a subject of 
"unsubscribe". Trouble? Send an email with subject "help" to 
development-request at logreport.org



More information about the Development mailing list