Input log written to $TMPDIR
Raymond Page
pagerc at ufl.edu
Wed Dec 29 08:48:12 CET 2004
Wytze,
That discussion is really helpful. Thanks a lot for sending it to
me. Perhaps this could be posted on the website as a faq or
something?
--
Raymond Page
On Wed Dec 29 02:39:32 EST 2004, Wytze van der Raay
<wytze at NLnet.nl> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Joost van Baal wrote:
> | Hi Raymond,
> |
> | On Tue, Dec 28, 2004 at 12:08:25AM +0100, Raymond Page wrote:
> |
> |>I'm curious why the log that I provide as input to
> lr_log2report
> |>is written to $TMPDIR. If the log file is being recorded to a
> |>dlf, can't it just process the log in place and write only to
> the
> |>temporary dlf store?
> |>
> |>I ask because I'm processing gigabyte sendmail logs. Having a
> 1GB
> |>log file written decompressed to tempspace and then written
> into
> |>the SQLite database seems like somewhere there's extra work
> being
> |>done which is using a lot of disk space.
> |
> |
> | It seems copying the raw logfile to $TMPDIR/logfile is done in
> | &Lire::LrCommand::handle_logfile . This was introduced
> 2004-08-30,
> | after Lire 1.5, with the reimplementation in Perl of lots of
> small
> | stand-alone utilities (lr_check_prereq, lr_dlf2xml, lr_inflate,
> | lr_log2xml, lr_store, lr_xml2ascii, lr_xml2chart, ...).
> |
> | &Lire::LrCommand::import_log calls Lire::ImportJob, which is
> passed a
> | file ( pattern => $self->{'_logfile'} ); see
> Lire::ImportJob(3pm).
> |
> | Hrm...
> |
> | Could you try to give your uncompressed logfile as an argument
> to
> | lr_log2report, and not pass it via STDIN? I believe passing
> just the
> | filename makes Lire skip the copy-to-tmpfile step.
>
> This reminds of a discussion with Francis Lacoste, the main
> developer
> for this code, a while ago in some other context. Here is what
> Francis
> noted at that time (September 2, 2004):
>
> |>The amount of /tmp space used by this job was about 2.2 GB (the
> sqlite
> stuff
> |>> I presume).
> |
> | Yes, it could be related to SQLite which will use temporary
> space to hold
> | the queries result. But be aware also that in Lire 1.5, the log
> file is
> | replicated several times.
> |
> | 1- Since lr_log2report read its log file from stdin, it saved
> it in a
> | temporary file. That 350M which stays there until the end of
> the run.
> |
> | 2- While it is imported into the temporary DlfStore, the Dlf
> data is
> written
> | to a temporary file which is read back into the DlfStore.
> At this
> stage,
> | we are at 3x350Mg. At the end of that phase, the temporary
> DLF file
> | will be removed.
> |
> | 3- The email superservice has one analyser, in Lire 1.5, the
> extended
> schema
> | table store the extended field + all the fields of the
> original
> table. So
> | we are againg doubling the required space.
> |
> | 4- It means that probably SQLite can account for half or less
> of the
> temporary
> | space.
> |
> | How Lire 2.0 improves things:
> |
> | 1) lr_log2report can now takes the log filename has parameter.
> So no
> need to
> | copy it into a temporary file (unless it is compressed).
> |
> | 2) Converting the old 2dlf script to the new perl module
> DlfConverter API
> | would eliminate the temporary DLF file used by the adapter.
> This is not
> | part of 2.0
> |
> | 3) The extended data table now only contains the extended data,
> so
> there is
> | no doubling of the data for each extended schemas (this will
> mainly
> benefit
> | the space requirements of generating www records). A SQL join
> is now
> used
> | when necessary.
>
> Using a non-compressed logfile on the command line (rather than
> through
> stdin) is apparently the way to minimize the amount of $TMPDIR
> space
> used by Lire 2.0.
>
> Regards,
> - -- wytze
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (MingW32)
> Comment: Using GnuPG with Thunderbird -
> http://enigmail.mozdev.org
>
> iD8DBQFB0l80qs+zhiEbbu8RApfeAKDqj2PIm0P9GR7NDs4aA6JCADjZ3ACcDGcA
> nawocDrDJ0iGpIDCvsl+kHs=
> =EjkF
> -----END PGP SIGNATURE-----
>
>
--
To UNSUBSCRIBE, email to questions-request at logreport.org with a subject of
"unsubscribe". Trouble? Send an email with subject "help" to
questions-request at logreport.org
More information about the Questions
mailing list