Input log written to $TMPDIR

Wytze van der Raay wytze at nlnet.nl
Wed Dec 29 08:39:32 CET 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joost van Baal wrote:
| Hi Raymond,
|
| On Tue, Dec 28, 2004 at 12:08:25AM +0100, Raymond Page wrote:
|
|>I'm curious why the log that I provide as input to lr_log2report
|>is written to $TMPDIR.  If the log file is being recorded to a
|>dlf, can't it just process the log in place and write only to the
|>temporary dlf store?
|>
|>I ask because I'm processing gigabyte sendmail logs.  Having a 1GB
|>log file written decompressed to tempspace and then written into
|>the SQLite database seems like somewhere there's extra work being
|>done which is using a lot of disk space.
|
|
| It seems copying the raw logfile to $TMPDIR/logfile is done in
| &Lire::LrCommand::handle_logfile .  This was introduced 2004-08-30,
| after Lire 1.5, with the reimplementation in Perl of lots of small
| stand-alone utilities (lr_check_prereq, lr_dlf2xml, lr_inflate,
| lr_log2xml, lr_store, lr_xml2ascii, lr_xml2chart, ...).
|
| &Lire::LrCommand::import_log calls Lire::ImportJob, which is passed a
| file ( pattern => $self->{'_logfile'} ); see Lire::ImportJob(3pm).
|
| Hrm...
|
| Could you try to give your uncompressed logfile as an argument to
| lr_log2report, and not pass it via STDIN?  I believe passing just the
| filename makes Lire skip the copy-to-tmpfile step.

This reminds of a discussion with Francis Lacoste, the main developer
for this code, a while ago in some other context. Here is what Francis
noted at that time (September 2, 2004):

|>The amount of /tmp space used by this job was about 2.2 GB (the sqlite
stuff
|>> I presume).
|
| Yes, it could be related to SQLite which will use temporary space to hold
| the queries result. But be aware also that in Lire 1.5, the log file is
| replicated several times.
|
| 1- Since lr_log2report read its log file from stdin, it saved it in a
| temporary file. That 350M which stays there until the end of the run.
|
| 2- While it is imported into the temporary DlfStore, the Dlf data is
written
|     to a temporary file which is read back into the DlfStore. At this
stage,
|    we are at 3x350Mg. At the end of that phase, the temporary DLF file
|    will be removed.
|
| 3- The email superservice has one analyser, in Lire 1.5, the extended
schema
|     table store the extended field + all the fields of the original
table. So
|    we are againg doubling the required space.
|
| 4- It means that probably SQLite can account for half or less of the
temporary
|     space.
|
| How Lire 2.0 improves things:
|
| 1) lr_log2report can now takes the log filename has parameter. So no
need to
| copy it into a temporary file (unless it is compressed).
|
| 2) Converting the old 2dlf script to the new perl module DlfConverter API
|    would eliminate the temporary DLF file used by the adapter. This is not
|    part of 2.0
|
| 3) The extended data table now only contains the extended data, so
there is
|    no doubling of the data for each extended schemas (this will mainly
benefit
|   the space requirements of generating www records). A SQL join is now
used
|   when necessary.

Using a non-compressed logfile on the command line (rather than through
stdin) is apparently the way to minimize the amount of $TMPDIR space
used by Lire 2.0.

Regards,
- -- wytze
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB0l80qs+zhiEbbu8RApfeAKDqj2PIm0P9GR7NDs4aA6JCADjZ3ACcDGcA
nawocDrDJ0iGpIDCvsl+kHs=
=EjkF
-----END PGP SIGNATURE-----

-- 
To UNSUBSCRIBE, email to questions-request at logreport.org with a subject of 
"unsubscribe". Trouble? Send an email with subject "help" to 
questions-request at logreport.org



More information about the Questions mailing list