Searching the Digest

From: Kevin Rose <vladt_at_interaccess.com>
Date: Mon, 20 Jul 1998 22:22:02 -0500 (CDT)


Nick Hollingsworth <NickH_at_compans.com> wrote:

>It would be a major resource if anyone could provide an undigested
>archive of all the messages from the various incarnations and
>versions of the mailing list under one search engine. Hopefully
>the thread structure could be maintained too - much like what
>Deja News offers for newgroups.

One does exist. I built it eight(?) months ago. It's sort of slow, but it is free. The thread structure is more or less there, but you cannot count on it.

As an excercise in setting up MicroQuish's Index Server I built a indexed and searchable archive that I keep up to date. You can reach it at chmeee.pronetsolutions.com\gd. Generally it is one day to a week out of date, as I had to use the licensed version of cron on another server. I need to manually run a couple of files to keep the system up to date and only do that when I'm in the office and have time. The summaries that IIS's Index Server provides are not very good, but it is free. Netscape Enterprise did a better job, but we don't own a license to it.

If anyone wants to set up their own version I can send you the message archives. Mostly they are in HTML format, though the really old ones are not.

Before you decide to do this realize it's size. For example, the archive of the all the digests, including the current one, is maybe 120MB in size, plus more than 60MB for the index. Posting this on a commercial server will cost you a small fortune, plus requires access to internal mail structures, cron, and the perl interpreter. If you have a T1 connected NT or UNIX box with a few hundered megs that no-one will miss you can get the scripts for it from me (NT) or Loren (UNIX).

Kevin


Powered by hypermail