A program that gives a more comprehensive view of the SMTP traffic and other flows

Arnaud Taddei Arnaud.Taddei at sun.com
Sun Mar 24 23:03:29 CET 2002


So you can read from the attached image 
	Orange-smtp-flow-model.jpg 

that we tried to represent the flows of traffic comming from some
specific ranges of IP addresses and machine names. 

In this case we have a 3 tier network Mail Architecture and we need to
make sure that we understand the flows at each block and between all the
clients and servers.

So on each of these arrows we should be able to read:
- the production flow: how from we received, how many rcpt mail we sent, 
- the pathological flow: how connections we rejected and at which
position in the SMTP dialog:
	- we can reject by client IP address
		- block at EHLO level
	- we can reject by from
		- block at MAIL FROM
	- we can reject by sender
		- block at RCPT TO
	- we can block relays attempts
		- block after RCPT TO but taking MAIL FROM into account

So we could at least get the production flow more or less correct. We
then built a program that would read the DLF file and would compare the
IP addresses and client names with a table built by hand by the system
administrator.

We did a quick and dirty program called 
	smtp-flows 

which is short and attached that would call the
	Logreport::Hosts

perl package that is attached too. This program can read a file called 
	$HOME/.lire/etc/clusters

which looks like (I removed some of the lines on purpose)
	127.0.0.1:DMZ Orange Mail Service
	192.168.30.2:Service Mail Hub (iWeb)
	192.168.20.2:Service Mail Hub (iWeb)
	192.168.40.101:Webmail
	192.168.40.103:Webmail Wap
	localhost:DMZ Orange Mail Service
	smtp.iorange.ch:Service Mail Hub (iWeb)
	smtp.orangemail.ch:Service Mail Hub (iWeb)
	212.215.1.67:Orange Corporate
	154.15.51.:Fixed IP Customers
	213.55.133.:HSCSD
	10.13.:GPRS
	10.14.:GPRS

and when you cat a DLF file into smtp-flows you typically end up with:

> cat <DLFFILE> | smtp-flows
..............

>From Table (with number of recipients)
======================================
SMTP Peer                          Total # Rcpt     # From

-                                           267        266
DMZ Orange Mail Service                     142        142
Fixed IP Customers                          669        539
GPRS                                         15         10
Orange Corporate                             23         23
Rest of the World                         11276      11116
Service Mail Hub (iWeb)                    1661       1643
---------------------------------------------------
TOTAL                                     14053      13739

To Table 
=========

-                                          6008
DMZ Orange Mail Service                       1
Rest of the World                          2569
Service Mail Hub (iWeb)                    5475
---------------------------------------------------
TOTAL                                     14053



So this approach shows that one can reassemble a view which is much more
comprehensible by IT managers, marketing and other troops and which
gives a lot of information on the real usage. Now indeed we know the
activity from the GPRS or Fixed IP lines, etc. which are key business
information. 

This of course means that the clusters file has to be defined at each
level of the infrastructure but it shows how simple this is and how
useful such a report is.

Then Arnaud Gaillard made a zoom for each of these flows and showed the
evolution overtime for a one month period of some of these flows. Thus
we could have a drill down approach and understand more the dynamics of
the site.

Now among the actions to do are:
1) incorporate that analysis into logreport (ok I really have to learn
the analysor and the way you compute things - sic)
2) detail more the pathological flow (which is probably what we read
from the '-' category in the above report)
3) make sure that we have more flexibility to import external 'clusters'
files depending on what log we are looking and from which role in the
architecture
4) give a way to aggregate such outputs overtime and on several machines
in the same area in the architecture
etc.

Let me know what you think about 

A++
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Orange-smtp-flow-model.jpg
Type: image/jpeg
Size: 165569 bytes
Desc: not available
Url : http://lists.logreport.org/pipermail/development/attachments/20020324/83044b11/attachment.jpg 
-------------- next part --------------
#!/usr/bin/perl

use lib "/homedir/m/march/lib";
use Logreport::Hosts;


$| = 1;
$clusters="$ENV{'HOME'}/.lire/etc/clusters";

$H2C = &Logreport::Hosts::Load_Clusters($clusters);

while(<STDIN>) {
    $lines++;
    chomp;
    @fields = split(/ /,$_);
    
    $f_host = &Logreport::Hosts::Host2Cluster($fields[7],$H2C);
    #printf "%-30s --> %-30s\n", $fields[7], $f_host;
    $t_host = &Logreport::Hosts::Host2Cluster($fields[13],$H2C);
    #printf "%-30s --> %-30s\n", $fields[13], $t_host;

    $F_IP{$f_host}++;
    $T_IP{$t_host}++;

    $qid = $fields[2];
    @l = grep(/$qid/, @{$FU_IP{$f_host}});

    if ($#l < 0) {
	push(@{$FU_IP{$f_host}},$qid);
    }

    if ($lines =~ /000$/) {
	print ".";
    }
}

print "\n\n";
print "From Table (with number of recipients)\n";
print "======================================\n";
printf "%-30s  %15s   %8s\n", "SMTP Peer", "Total # Rcpt", "# From";
print "\n";

foreach $ip (sort keys %F_IP) {
    $c = $c + $F_IP{$ip};
    @f = @{$FU_IP{$ip}};
    $g = $#f +1;
    $o = $o + $g;
    printf "%-30s  %15d   %8d\n", $ip, $F_IP{$ip}, $g;
}
print "---------------------------------------------------\n";
printf "%-30s  %15d   %8d\n", 'TOTAL', $c, $o;


print "\n";
print "To Table \n";
print "=========\n";
print "\n";

foreach $ip (sort keys %T_IP) {
    $d = $d + $T_IP{$ip};
    printf "%-30s  %15d\n", $ip, $T_IP{$ip};
}
print "---------------------------------------------------\n";
printf "%-30s  %15d\n", 'TOTAL', $d;










-------------- next part --------------
package Logreport::Hosts;

=pod

=head1 NAME

Logreport::Hosts - Perl Module that allows manipulations on hosts for log analysers

=head1 SYNOPSIS
;

=head1 DESCRIPTION

This package intends to offer nice functions for mappings between IP addresses and hosts as well as hosts 'clusters'

=head1 FILES

Files to review

=head1 SEE ALSO

Other resources

=head1 COPYRIGHT

Copyright Sun - 2002

=head1 VERSION

$Revision: 0.1 $

=head1 DATE

$Date: 2000/06/27 09:04:00 $

=head1 AUTHOR

Arnaud Taddei <Arnaud.Taddei at sun.com>

=cut


use strict;


=pod

=head1 FUNCTIONS

=cut


=pod

=head2 Load_Clusters
=cut

sub Load_Clusters {
    my($clusters) = @_;
    my(%host2cluster, at cluster);


    open(O,$clusters) || warn "$clusters is not readable: $!";
    while(<O>) {
	chomp;
	@cluster = split(/:/,$_);
	$host2cluster{$cluster[0]} = $cluster[1];
    }
    return \%host2cluster;
}

=pod

=head2 Host2Cluster

=cut

sub Host2Cluster {
    my($host,$h2c,$mode) = @_;
    my(%h2c) = %$h2c;
    my($orig_host) = $host;

    # We try to catch the IP ranges first

    if (defined $h2c{$host}) {
	return $h2c{$host};
    } elsif ($host eq '-') {
	return '-';
    } elsif ($host =~ /(\d+\.\d+\.\d+\.)\d+/) {
	$host = $1;
	if (defined $h2c{$host}) {
	    return $h2c{$host};
	} elsif ($host =~ /(\d+\.\d+\.)\d+/) {
	    $host = $1;
	    if (defined $h2c{$host}) {
		return $h2c{$host};
	    }
	}
    }
    if (defined $mode && $mode eq 'transparent') {
	return $orig_host;
    } else {
	return "Rest of the World";
    }
}

1;



More information about the Development mailing list