new dlf query language: sql or xml/xsl?
Joost van Baal
joostvb at logreport.org
Mon Jun 18 15:37:01 CEST 2001
On Thu, Jun 14, 2001 at 09:08:21AM -0400, Francis J. Lacoste wrote:
>
> BTW, I thought a little about moving querycalc to SQL and I'm not
> sure it's a good idea. I've done much SQL (SQL based e-commerce
> application was the other big things at iNsu for which I developped
> Apache::iNcom, a web application framework) and SQL isn't a very
> expressive reporting language. For example, one of the most important
> to do in reports is aggregates report (i.e. number of emails per hour,
> per day of week, etc.). And this is very complicated to do in SQL. It
> takes on QUERY for every period because the language can't express
> SELECT timestamp, sum(email_count) GROUP BY timestamp.hour . You have to
> do SELECT sum(email_count) WHERE timestamp < 01:00 and timestamp < 01:59,
> and iterate for each period.
Currently, we use dlf_addtm(1) to handle this. Once a dlf is extended
with these extra timestamps, I guess building an sql query will be much
simpler.
> I think we would be better to use a XML format (something like XSL
> especially targeted at reporting). A program would be responsible
> to turn the XML meta-report into a XML report. This could also helps
> with internationalization of the report. (Translate only the XML meta
> report, xml-i18n-tools could help for that.) If the XML meta-report
> "language" is expressive enough, it would become very easy to add reports
> to system.
>
> For example, the previous example could become like
>
> <aggregate-subreport start="00:00" end="24:00" period="1h">
> <title>Email by <period/></title>
> <aggregate-field>timestamp</aggregate-field>
> <sum>email_count</sum>
> <avg>email_size</sum>
> </aggregate-subreport>
>
> The attributes could also be specified at runtime. And I'm sure people
> will find it easier to learn that simple reporting language than SQL.
I don't know how difficult it would be to answer a query like the one
you give as an example. (But I'm afraid it's far more difficult than
answering a query in sql syntax.) Are there tools available for these
kind of things?
BTW, another issue is: currently we're reading the same dlf about as
many times as the number of subreports we're generating. (This is not
exactly true: for email reports we have 'filter' and 'filter_messages',
see cvs-sourceforge/logreport/service/*/etc/reportscripts* ).
We should exploit these filters, and use them to gain performance.
The questions we're currently answering are questions like:
Give me for each value of field <fieldname> this value, and the number
of dlf records where the value of field <fieldname> is this value,
sorted by number.
Give me for each value of field <fieldname_string> this value, and the
sum of the values of field <fieldname_number>, sorted by these sums.
Give me for each pair of fields <fieldname_string_a>,
<fieldname_string_b>, the number of dlf records, sorted by number.
Give me for each triple of fields <fieldname_string_a>,
<fieldname_string_b>, <fieldname_string_c>, these values, along
with the value of field <fieldname_integer>, sorted by the value of
<fieldname_integer>.
Note that we currently create fields like 20010611 (yyyymmdd) and 23
(hh) for timerelated reports.
For email, after filtering the dlf, the occerence of a dlf record
indicates a _successful_ delivery. (Before filtering, it means any
delivery attempt.)
When thinking about a different query language, it's useful to keep this
in mind.
Bye,
Joost
--
To UNSUBSCRIBE, email to development-request at logreport.org with a subject of
"unsubscribe". Trouble? Send an email with subject "help" to
development-request at logreport.org
More information about the Development
mailing list