Note: I am no longer doing any new development of wwwoffle.
I feel I have reached a dead end where the basic design of wwwoffle severely limits
me in pursuing new directions.
I have started a completely new alternative project
WWWOFFLE is a simple proxy server written by Andrew M. Bishop with special
features for use with dial-up internet connections.
The official WWWOFFLE homepage by the author can be found at http://www.gedanken.demon.co.uk/wwwoffle/.
I've been using WWWOFFLE on Linux for a number of years now, and I've found
it to be very useful in keeping my phone costs down. I have a permanent connection
to the Internet now, but I still use WWWOFFLE to share the internet connection
on my home LAN, to filter out ads and other undesirable stuff, and to keep
backup copies of the webpages I've visited.
I've come to know WWWOFFLE so intimately now that I find it hard to switch to another proxy, even though there are plenty of alternatives.
As with almost any piece of software, WWWOFFLE has its quirks. Because
WWWOFFLE is licensed under the GPL and the source is freely available, I was
able to fix myself most of the features that I found defective or lacking.
The WWWOFFLE source is nicely structured and quite readable, and I found hacking WWWOFFLE quite enjoyable, even addictive to a certain degree.
During the course of time I made numerous changes to the WWWOFFLE source, fixing bugs, adding a few features, and sometimes just modifying the code to suit my personal taste. Even though I like the overall structure of WWWOFFLE, I disagree with a number of the implementation details and I've changed most of the things I didn't like.
Out of respect for the free-software community, I used to make the modifications I've made available via this webpage as a patch file. Currently the latest code is available from a git repository. (The old patches are still available from this page.)
To compile my code, you need to have git installed on your system, which shouldn't be a problem because it is available as a package on almost every Linux distribution. This is the command you need to get the code from gitorious.org:
git clone git://gitorious.org/wwwoffle/wwwoffle-par.git wwwoffle-par
After changing into the
wwwoffle-par directory, you can run
just as you would with the original source code. Note: My current code breaks
IPv6 functionality. Because in version 2.9 IPv6 is enabled by default, you will
need to configure using the
Important: Before you can start wwwoffled you must make some changes to the CensorHeader section of the configuration file and convert the U* files in the cache to a new database file. The details can be found in the file README.par in the source directory (but you can also download this file separately) .
Here's a summary of the most important modifications I've made (in reverse chronological order):
keep-alive") connections, both to the client and to remote servers. This can be significantly more efficient for two reasons:
allow-keep-alive = yes. Of course support must also be enabled in the browser, but most browsers have this by default. See README.par for more details.
keep-cache-if-header-matches. This option is similar to
keep-cache-if-not-found, but examines the header lines (instead of the status code) of the reply from the remote server to determine whether the new page is less desirable than the already cached one. I use this new option to prevent a certain site that requires a subscription from overwriting a good copy of an article with a login page. Unfortunately, this new option is not as generally useful as I had hoped, because only few of these types of sites give clues in the reply header which can be used to determine which replies contain the full article text and which contain the login page.
urlhashtableinstead of the many small U* files. This reduces the number of files in the WWWOFFLE cache by almost 50%. Having many small files is quite inefficient. For example, on my system each U* file occupies 4K of disk space. With 78587 U* files that is 307MB of occupied disk space. My
urlhashtablefile uses less than 7MB! (The file will appear to be much larger, but actually occupies only a very modest amount of disk space. See README.par for details.)
wwwoffle-ls2utility uses a separate lookup table to make pattern matching of URLs significantly more efficient, I got an idea how to implement this directly in WWWOFFLE. My implementation uses a single file which is mmapped to an area of address space that is shared between all WWWOFFLE processes.
"use-url = yes"is now much faster (by as much as a factor of 4 on my system). I also believe that storing the contents of webpages while online has become somewhat faster (because only one file per webpage needs to be written instead of two), but I haven't actually done any measurements to verify this. The reading of webpages stored in the cache will probably not be significantly improved.
replacement-meta-refresh-time. See README.par for more information.
always-use-etag. Certain server farms spoil the usefulness of Etags by issuing different Etags for the same content, causing WWWOFFLE to needlessly download the same page or image repeatedly, even though the content hasn't changed at all. Setting
always-use-etag = nocan remedy this problem by forcing WWWOFFLE to base conditional requests on the Last-Modified time only, if it can be considered a strong validator.
validate-with-etag, but I still use Marc Boucher's implementation of this feature.
keep-cache-if-not-found. This is useful for preventing old cached versions of pages being overwritten by error messages from a web server. An implementation of this feature is also available as a separate small patch file from this page.
cache-control-no-cachethat can be used in the offline section of the configuration file. This option works similarly to "pragma-no-cache" and can be used to reduce the number of outgoing requests that are generated when you hit the reload button of your web-browser while offline.
session-cookies-only. When enabled, WWWOFFLE strips the expires field from
Set-Cookie:server headers. Most browsers will not store such cookies permanently and forget them in between sessions.
A more detailed list of the changes I've made can be found in the README.par file in the source directory. At the end of this file you can also find my email address should you wish to contact me.
PS: If you are looking for a caching DNS proxy, you might also be interested in my pdnsd webpage.