Re: [SLUG] Web Page Storage

From: Kwan Lowe (kwan@digitalhermit.com)
Date: Thu May 11 2006 - 12:01:37 EDT

Next message: Eben King: "RE: [SLUG] Firefox"
Previous message: S0TL: "[SLUG] Web Page Storage"
In reply to: S0TL: "[SLUG] Web Page Storage"
Next in thread: Eben King: "Re: [SLUG] Web Page Storage"
Reply: Eben King: "Re: [SLUG] Web Page Storage"
Reply: S0TL: "Re: [SLUG] Web Page Storage"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

> So my question is how can one save what is open on a website say as a MS
> Office .doc or OpenOffice OO file which is searchable, has the same
> information, and does not take 10 minutes or so per page to do? [I am assume
> of course that the website is something like HTML not something like Adobe
> Acrobat.]

If you're mainly concerned about the text of the document I'd suggest using wget or
curl to pull the web page, then process the html to create a text document using
links. I.e., wget URL; links -dump file.html > file.txt

-- 
* The Digital Hermit   http://www.digitalhermit.com
* Unix and Linux Solutions   kwan@digitalhermit.com
-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS).  Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.

Next message: Eben King: "RE: [SLUG] Firefox"
Previous message: S0TL: "[SLUG] Web Page Storage"
In reply to: S0TL: "[SLUG] Web Page Storage"
Next in thread: Eben King: "Re: [SLUG] Web Page Storage"
Reply: Eben King: "Re: [SLUG] Web Page Storage"
Reply: S0TL: "Re: [SLUG] Web Page Storage"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 18:49:24 EDT