Re: [SLUG] Bogofilter and spam/ham files

From: Paul M Foster (paulf@quillandmouse.com)
Date: Tue Apr 01 2003 - 18:07:58 EST


On Tue, Apr 01, 2003 at 11:07:39AM -0500, Derek Glidden wrote:

> On Mon, 2003-03-31 at 23:57, Paul M Foster wrote:
> > Anyone using bogofilter out there, who's noticed that the spam and ham
> > files grow huge, without an apparent end in sight? My ham file
> > (goodlist.db) is 15M and my spam file (spamlist.db) is 9M. Anyone know
> > if there are plans to fix this?
>
> I've been using bogofilter for some months now.
>
> You'd expect those two files to get big and grow larger because those
> are the files that hold the lists of words and statistics for matching
> "spam" and "non-spam."
>
> Anytime there's a word in a "spam" message that it hasn't seen before,
> it gets added to the spamlist file. Anytime there's a word in a
> "non-spam" message that it hasn't seen before, it gets added to the
> goodlist file. Along with statistics about how often the words are
> seen, etc.
>
> My goodlist is 22M and my spamlist is 3M. (Which is very telling. It
> shows that the spam I get mostly all says the same thing, relative to
> "real" email I get. And my ratio is probably close to 1:4 or 1:5 spam
> to non-spam)
>
> It's a good thing. The more words in each list, the more accurate it
> is.

This I know. The problem is that at some point (now), it starts to slow
down receipt of mail because there is such a large file(s) to read. What
happens when your files get to be 50M?

Paul



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 17:52:53 EDT