I find that preloading pages generally isn't a good idea - it generally
sucks down more bandwidth than it saves, though it quite possibly has it's
uses. Some of the uses that it does have is in cases where pages
are VERY popular (the recent Mars landing springs to mind) or you have
sufficient bandwidth after hours to pull all those pages down, but your
daytime bandwidth isn't coping. Note that some web-admins get irritated if
you download hundreds of Megs every day from their servers, just in case it
gets used. If you have a small disk on your cache, downloading the whole
of www.netscape.com will probably completely fill it, and since only
a small amount of the pages on their site a really used it effectively
means that you are purging the contents of your cache every night, which
is a very bad idea.
Software to do the download:
I would suggest using 'wget', which is available from your nearest
gnu-mirror, or at ftp.is.co.za
since it you won't have to modify it to do the download. You can simply
use the --delete-after option and set the appropriate
environment variables to get it to use your cache server:
#!/bin/tcsh
setenv http_proxy http://proxy.mine.com:3130/
setenv ftp_proxy http://proxy.mine.com:3130/
#DON'T RUN THIS AS ROOT!
cd /tmp/junk-url/
rm -rf *
wget -r -nd --delete-after http://some.site/
The above should do what you want. Note that the ftp_proxy is so that you can
download the contents of a well-used ftp site. The wget options are
from the wget info file which comes with the wget tar file. You should
probably check out the available options and decide if you want to add other
options.