Re: [SLUG] Bogofilter and spam/ham files

From: Derek Glidden (dglidden@illusionary.com)
Date: Wed Apr 02 2003 - 11:37:42 EST


On Tue, 2003-04-01 at 23:27, Paul M Foster wrote:
> > I haven't really had the problem. My mail server is an Athlon 700,
> > though.
> >
>
> Ah, well, my machine is not worthy to lick the surface mount components
> of your video card, then. Perhaps not even worthy to feel the breeze
> from your CPU fan. <sniffle>

One of the few, uh, advantages of spending most of your disposable
income on Computer hardware, I guess.

On the other hand, it also means you don't find a girl and get married
until you're 30. :)
 
> > Looking things up in the db files should be quite speedy.
>
> It's gone from, say, 1/20 of a second when first installed to almost a
> second now per email, with a lot of disk thrashing. Better than
> spambouncer or junkfilter's performance, but not as perky as it used to
> be.

Thrashing like that sounds like a lack of RAM. I bet when you first set
it up, the whole db files would fit into cache, and lookups were
speedy. Now they've grown large enough that the caching can't hold the
whole thing at once, and probably bogofilter is having to make much
larger data structures to hold everything since there's so much data in
the dbs now.

Do a "vmstat" while you're processing mail and see if some of the
trashing isn't swapping as well as just plain disk I/O with no caching.
 
> I could readily see that the spam/ham files would grow indefinitely
> unhindered, so the point of my question was whether this was a mutually
> experienced problem, and if any bogofilter-coded solution was planned,
> known, or in the works. I guess not.

I think the fact that they're using DB files to begin with is pretty
much it. Berkeley DB is designed to be small, fast, efficient and all
the other things you'd need for embedded work. Short of coming up with
some sort of quantum storage, I don't think they can do a whole lot
more.

Unfortunately with the way bogofilter works, the tradeoff for more
accuracy is that you're just gonna need to keep growing your wordlist
and the bigger they get, the larger the application footprint will be
while running.

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#!/usr/bin/perl -w
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map
{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;
$t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)
[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join
"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d=
unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d
>>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*
8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}
print+x"C*",@a}';s/x/pack+/g;eval 

usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \ | extract_mpeg2 | mpeg2dec -

http://www.cs.cmu.edu/~dst/DeCSS/Gallery/ http://www.eff.org/ http://www.anti-dmca.org/



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 17:55:11 EDT