Re: [SLUG] Bogofilter and spam/ham files

From: Derek Glidden (dglidden@illusionary.com)
Date: Tue Apr 01 2003 - 11:07:39 EST


On Mon, 2003-03-31 at 23:57, Paul M Foster wrote:
> Anyone using bogofilter out there, who's noticed that the spam and ham
> files grow huge, without an apparent end in sight? My ham file
> (goodlist.db) is 15M and my spam file (spamlist.db) is 9M. Anyone know
> if there are plans to fix this?

I've been using bogofilter for some months now.

You'd expect those two files to get big and grow larger because those
are the files that hold the lists of words and statistics for matching
"spam" and "non-spam."

Anytime there's a word in a "spam" message that it hasn't seen before,
it gets added to the spamlist file. Anytime there's a word in a
"non-spam" message that it hasn't seen before, it gets added to the
goodlist file. Along with statistics about how often the words are
seen, etc.

My goodlist is 22M and my spamlist is 3M. (Which is very telling. It
shows that the spam I get mostly all says the same thing, relative to
"real" email I get. And my ratio is probably close to 1:4 or 1:5 spam
to non-spam)

It's a good thing. The more words in each list, the more accurate it
is.

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#!/usr/bin/perl -w
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map
{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;
$t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)
[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join
"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d=
unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d
>>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*
8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}
print+x"C*",@a}';s/x/pack+/g;eval 

usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \ | extract_mpeg2 | mpeg2dec -

http://www.cs.cmu.edu/~dst/DeCSS/Gallery/ http://www.eff.org/ http://www.anti-dmca.org/



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 17:51:32 EDT