Were would I find ht/dig? I was looking at wordindex at http://wordindex.sourceforge.net/ and it says it can index just about any file. It is considered beta though.
Brett Simpson
Internet Administrator for Hillsborough County
(813) 301-7144
simpsonb@hillsboroughcounty.org
>>> herrold@owlriver.com 11/30/01 09:46PM >>>
On Fri, 30 Nov 2001, Ronan Heffernan wrote:
> Brett Simpson wrote:
> Does anyone know of any Open Source Document imaging
> software that has the following capabilities?
> Full text search of PDF files.
ht/dig will index and provide a search interface for the
fulltext of well-formed PDF's
> Web based document management.
Feature set becomes an issue -it is too easy to extend it
without limit or prioritization. - Without more details, it
is hard to scale the Needs/Requirements the questioner has in
mind; The Cobalt Qube's do some of this in Linux -- but not
some things easy and harder I might think of including:
1. Assign a unique permanent ID number to a document
2. Retain a permanent changelog of everyone who has viewed a
document, and implement an electronic filing cabinent
3. Enforce ACL's on index results display (implies an ACL
system -- see last para on this topic)
4. Configure and apply a documents retention policy against
each document
5. Replicate the documents database (on demand or en toto)
between multiple servers, potentially at multiple sites
6. Add scanner, and fax, and 'directory watcher' acquire and
resend interfaces
7. Add an OCR initial decode, and proofread scheduling,
annotation, markup and final release system
8. With a web, and X and text, and command line interface,
permitting scripting
... It might be fun to get YOUR feature list -- where YOU is
every group member, and decompose and implement them as a
collaborative project.
--------------------
9 A submission/voting/comment system set of PHP webpages,
assgning and maintaining ech participant or commenter's
identity credentials, and a database backend can be made
available of someone wants to step up to spearheading the task
(including writing at least working drafts of those tools
mentioned in this paragraph).
Contact me offlist if interested
-- Russ Herrold
This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 20:00:00 EDT