Re: [SLUG] Is there an archive for the list?

From: thor_consulting@yahoo.com
Date: Fri Sep 12 2003 - 09:07:45 EDT


Ed - sorry i missed the original post

many moons ago i did this type of thing with Perl (Perl was made for this
type of data-mining).

i wrote a general purpose data-mining tool that works with everything form
positional and tagged text data to dynamic format data.

our database used proper noun (PN) and key phrase (KP) indices along with
tf-idf ranking to improve search results.

the search engine was developed at Syracuse U. and it's at the core of IBM's
patent search system and part of the CIA's repertoire

very cool stuff but i digress

might be overkill but let me know if you want use Perl to do the mining

thor

mailto:thor_consulting@yahoo.com
http://www.geocities.com/thor_consulting/
----- Original Message -----
From: "Michael Manchester" <mchester@yahoo.com>
To: <slug@nks.net>
Sent: Friday, September 12, 2003 06:25
Subject: Re: [SLUG] Is there an archive for the list?

> Ed;
> Sounds interesting. That might be fun to work on. As
> long as there are no deadlines. I get enough of those
> at work :) You can sign me up for the front end. I'm
> thinking a browser front end. What do you think?
> Mike M.
> --- Ed Centanni <ecentan1@tampabay.rr.com> wrote:
> > I have a personal mail archive of the SLUG list that
> > dates from
> > 11/30/1999 to the present. The file size is
> > 77,130,688 bytes.
> > Realizing what a great knowledge-base it represents
> > and desiring to make
> > use of it, I started a small project to create a
> > searchable index
> > database of ALL aspects of the SLUG list. I call it
> > "emine", short for
> > "E-mail Data Mining".
> >
> > It's a python script that parses a standard
> > unix-style email mailbox
> > file and builds an SQL database of string lengths
> > and file offset
> > pointers into the mail file. The database doesn't
> > contain the actual
> > email, it just stores the location and size of
> > everything in the mailbox
> > file. The index database is normalized and has 9
> > related tables that
> > contain information for headertypes, mimetypes,
> > mailboxes, messages,
> > headers, words, attachments, word locations, and
> > phrases from 2 to 4 words.
> >
> > It's a work in progress. At the moment it can
> > populate all the tables
> > except words, word locations, and phrases. It
> > shouldn't take me more
> > than a few evenings to finish that up. It would
> > need a user friendly
> > frontend to be useful as a search engine but that
> > shouldn't be rocket
> > science once the database is fully populated. The
> > front end would get
> > user input, query the database, fseek to the
> > offset(s) in the mailbox
> > file and output the results.
> >
> > If this seems interesting to any of you, I'm willing
> > to put it up as an
> > open source project for anyone to work on and use.
> > If you know of a
> > similar project already available I'd like to know.
> >
> > Ed.
> >
> >
> > Michael Manchester wrote:
> >
> > > I thought at one time there was an archive of the
> > list
> > > or at least talk about having an archive of the
> > list.
> > >
> > > Mike M.
> > >
> > > =====
> > <snip>
> >
> >
> >
> -----------------------------------------------------------------------
> > This list is provided as an unmoderated internet
> > service by Networked
> > Knowledge Systems (NKS). Views and opinions
> > expressed in messages
> > posted are those of the author and do not
> > necessarily reflect the
> > official policy or position of NKS or any of its
> employees.
>
>
> =====
> ---------------------------------
> The requirements said
> "Windows 95/98/NT or better"
> So I installed Linux
> ---------------------------------
>
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com
> -----------------------------------------------------------------------
> This list is provided as an unmoderated internet service by Networked
> Knowledge Systems (NKS). Views and opinions expressed in messages
> posted are those of the author and do not necessarily reflect the
> official policy or position of NKS or any of its employees.

-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 20:33:59 EDT