15.5. Cache Communication ProtocolsWhen we discussed proxying and HTTP, we also discussed caching, which is one of the primary uses of web proxies. Caching is very important as a way of speeding up transfers and reducing the amount of data transferred across crowded links. Once cache servers are set up, the next logical step is to use multiple cache servers and have them coordinate operations. A lot of active development is going on, and it's not at all clear what protocol is going to win out in the long run.
15.5.1. Internet Cache Protocol (ICP)ICP is the oldest of the cache management protocols in current use and is supported by the largest number of caches, including Netscape Proxy, Harvest, and Squid. The principle behind ICP is that cache servers operate independently, but when a cache server gets a request for a document that it does not have cached, it asks other cache servers for the document, and retrieves the document from its source only if no other cache server has the document. ICP has a number of drawbacks; it requires a considerable amount of communication between caches, it slows down document retrieval, it provides no security or authentication, and it searches the cache based only on URL, not on document header information, which may cause it to return incorrect document versions. On the other hand, it has the noticeable advantage of being both standardized (it is documented in IETF RFCs 2186 and 2187) and in widespread use.
126.96.36.199. Packet filtering characteristics of ICPICP normally uses UDP; the port number is configurable but defaults to 3130. ICP can also be run over TCP, once again at any port. Caches exchange documents via HTTP. Once again, the port used for HTTP is configurable, but it defaults to 3128.
3128 is the standard port number for intercache HTTP servers, but some servers run on different port numbers.
188.8.131.52. Proxying characteristics of ICPICP, like SMTP and NNTP, is a self-proxying protocol, one that allows for queries to be passed from server to server. In general, if you are configuring ICP in a firewall environment, you will use this facility and set all internal cache servers to peer with a cache server that's part of the firewall and serves as a proxy.
Since ICP is a straightforward TCP-based protocol, it would also be possible to proxy it through a proxy system like SOCKS; the only difficulty is that you would end up with a one-way relationship, since the external cache would not be able to send queries to the internal cache. This would slow down performance without providing any more security than doing self-proxying, and no current implementations support it.
184.108.40.206. Network address translation characteristics of ICPICP does contain embedded IP addresses, but they aren't actually used for anything. It will work without problems through network address translation systems, as long as you configure a static translation (to allow for requests from other peers) and don't mind the fact that the internal address will be visible to anybody watching traffic.
15.5.2. Cache Array Routing Protocol (CARP)CARP uses a completely different approach. Rather than having caches communicate with each other, CARP does load balancing between multiple cache servers by having a client or a proxy server use different caches for different requests, depending on the URL being requested and published information about the cache server. The information about available cache servers is distributed through HTTP, so CARP adds no extra protocol complexity. For both packet filtering and proxying, CARP is identical to other uses of HTTP. However, CARP does have difficulties with network address translation, since the documents it uses are guaranteed to have IP addresses in them (the addresses of the cache servers). Netscape and Microsoft both support CARP as well as ICP.
15.5.3. Web Cache Coordination Protocol (WCCP)WCCP is a protocol developed by Cisco, which takes a third completely different approach. In order to use WCCP, you need a router that is placed so that it can intercept all HTTP traffic that should be handled by your cache servers. The router will detect any packet addressed to TCP port 80 at any destination and redirect the packet to a cache server. The cache server then replies directly to the requestor as if the request had been received normally. WCCP is used for communication between the router and the cache servers, so that the router knows what cache servers are currently running, what load each one is running under, and which URLs should be directed to which servers, and can appropriately balance traffic.
220.127.116.11. Packet filtering characteristics of WCCPWCCP uses UDP at port 2048. In addition, routers that use WCCP redirect HTTP traffic to cache servers by encapsulating it in GRE packets (GRE is a form of IP over IP, discussed in Chapter 4, "Packets and Protocols "). WCCP uses GRE protocol type hexadecimal 883E. Note that neither UDP nor GRE uses ACK bits.
18.104.22.168. Proxying characteristics of WCCPBecause WCCP uses both UDP and GRE, it is going to be difficult to proxy. Although UDP proxies have become relatively common, GRE is still unknown territory for proxy servers.
15.5.4. Summary of Recommendations for Cache Communication Protocols
Copyright © 2002 O'Reilly & Associates. All rights reserved.