Re: [SLUG] Bogofilter and spam/ham files

From: Derek Glidden (dglidden@illusionary.com)
Date: Wed Apr 02 2003 - 11:39:00 EST


On Wed, 2003-04-02 at 05:29, Levi Bard wrote:
> It seems to me that a tool like this should have an option for setting a limit on the size of the definitions files, with the ideal implementation keeping track of the "best and worst" entries in each db, (possibly ranked by percentage of false-positives vs number of matches) and gradually refining them (e.g. every time it comes up with a new filter, check whether it's better than the worst filter, if so replace it). This would add a little overhead too, but nothing like matching every email against two 200MB dbs full of expressions.
>
> Levi
>
> It has occurred to me that this reply serves no purpose but to outline some imaginary feature that will likely never be implemented, but oh well...

Yep, because the way bogofilter works, "best" and "worst" are relative
per-email. Which is why you have to keep ALL the words for it to work.

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#!/usr/bin/perl -w
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map
{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;
$t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)
[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join
"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d=
unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d
>>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*
8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}
print+x"C*",@a}';s/x/pack+/g;eval 

usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \ | extract_mpeg2 | mpeg2dec -

http://www.cs.cmu.edu/~dst/DeCSS/Gallery/ http://www.eff.org/ http://www.anti-dmca.org/



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 17:55:18 EDT