cc/td/doc/product/webscale/webcache/ce17
hometocprevnextglossaryfeedbacksearchhelp
PDF

Table of Contents

Planning a Web Caching System

Planning a Web Caching System

This chapter describes the Cisco Cache Engine and the requirements and considerations for using the engine. Before you install the Cisco Cache Engine, take time to carefully plan the placement of Cache Engines on your networks. This helps ensure that you receive the maximum benefit of caching web traffic.

This chapter covers these topics:

Introducing the Cisco Cache Engine

The Cisco Cache Engine works with a router running Cisco IOS software with the Web Cache Control Protocol (WCCP.) WCCP redirects HTTP traffic (traffic that uses TCP port 80) to a Cache Engine; the Cache Engine then manages the web request. See "Web Cache Control Protocol" for more information on WCCP.

Thus, the Cache Engine works in tandem with a router to handle web traffic. This traffic includes user requests to view pages and graphics (objects) on World Wide Web servers, whether internal or external to your network, and the replies to those requests.

When a user requests an object from a web server, the router first sends the request to a Cache Engine. If the Cache Engine has a copy of the requested object in storage, the Cache Engine sends the user those objects. Otherwise, the Cache Engine gets the requested objects from the web server, stores a copy of the objects (caches them), and simultaneously forwards the objects on to the user.

By caching web objects in this manner, the Cisco Cache Engine can speed the satisfaction of user requests if more than one user wants to access the same objects. This also reduces the amount of traffic between your network and the Internet, potentially improving your overall network performance and optimizing your bandwidth usage.

Understanding Cache Engine Characteristics

Table 1-1 lists the characteristics of the Cache Engine.


Table 1-1: Cache Engine Characteristics

Characteristic Model CE2050

Cache storage

24 GB

Maximum number of concurrent TCP sessions

2000

Dimensions

Height: 5.21 in (13.23 cm)

Width: 16.82 in (42.72 cm)

Depth: 17.5 in (44.45 cm)

Weight

32 pounds (14.5 kg)

Power

  • Auto-switching from low range: 90-135 VAC

  • Auto-switching from high range: 180-270 VAC

  • Frequency: 47-63 Hz

  • Maximum power: 253 Watts

Current

  • 115-VAC Input, Full Load: 4.2 A maximum

  • 230-VAC Input, Full Load: 0.5 A maximum

Console port

DB-9 connector

100BaseTX/10BaseT Ethernet autosensing port

RJ-45 connector

Temperature

Operating: 23 to 113°F (-5 to 45°C)

Non-operating: -13 to 158°F (-25 to 70°C)

Operating humidity

5 to 95%, noncondensing

2000 TCP Sessions

The maximum number of concurrent TCP sessions for each Cache Engine limits the amount of traffic the Cache Engine handles at any given time. Version 1.7 of the Cache Engine has an upper limit of 2000. This limit ensures that the Cache Engine does not get overloaded and impair network performance. Once the limit is reached, the Cache Engine accepts and queues up requests 2001 through 2050.

The Cache Engine does not acknowledge requests over 2050. This causes the browser to pause and then resend the request in an attempt to find an available connection. The second connection attempt may result in either a place in the queue or the connection. In the unlikely event that neither a place in the queue or the connection can be granted for three such attempts by the browser, the request times out. A message is returned indicating that the server is either busy or unreachable.

The number of users that the 2000 session limit can accommodate depends on several factors:

Understanding Cache Engine Hardware

In order to attach a Cache Engine to your network, your network must meet these requirements:

  You can also use the EthernetConfig console command to verify that your ethernet link works properly. Connect your Cache Engine to its hub, switch or router. Using one of the commands below and then ping your system:

In addition, you must attach a console to the Cache Engine in order to initially configure the engine. This can be a normal VT100-style console, or a laptop or desktop computer that can attach to the engine's serial port. Once you have completed the basic configuration, you manage the Cache Engine using a web browser from any machine (see "Connecting to the Cache Engine Management Interface").

A single router can support a cache farm of up to 32 Cache Engines.

Understanding How the Cisco Cache Engine Works

This section describes what the Cisco Cache Engine and WCCP do with web traffic. It also explains the concepts you need to understand in order to plan an effective caching strategy. Figure 1-1 shows an example network containing several routers, Cache Engines, web servers, and users. Subsequent sections refer back to this graphic to illustrate the described concepts.


Figure 1-1: Network Containing Cache Engines


What Does the Cisco Cache Engine Do?

The Cisco Cache Engine works with a router to handle web traffic. When the router receives a request that uses TCP port 80 (typically HTTP, or web, traffic), the router transparently redirects the request to a Cisco Cache Engine. The Cache Engine then attempts to satisfy the request from its own storage.

If the requested object is already in storage, the Cache Engine returns the stored object to the user. As far as the user knows, the object came directly from the web server. By fulfilling the request from its own storage, the Cache Engine eliminates the need to send the request to the Internet and receive data from the Internet, thus freeing your Internet connection for other traffic. The user might also get the requested object more quickly than possible from the Internet.

If the Cache Engine does not already have the requested object in storage or if the object is expired, the Cache Engine retrieves the object from the requested web server. While sending the request to the user, the Cache Engine caches the object (stores a copy of the object on its own disk drives), so that subsequent requests for the object, whether from this user or from another user, can be satisfied from the engine's storage. Refer to HTTP 1.1 RFC for additional information on cache control directives.

Thus, most web traffic, whether going out to the Internet or coming back from the Internet, gets routed to the Cache Engine, and the Cache Engine manages the communication between the user and the Internet. Because the Cache Engine sees all web traffic, you can configure the Cache Engine to prevent users from accessing certain servers (for example, sites with undesirable photographs).

Why Use the WCCP?

In order for a router to use the Cache Engines, it must know that there is an attached Cache Engine. When you turn on a fully-configured Cache Engine, it announces to the router that it is up and ready to handle web traffic. The router, in turn, must respond to the Cache Engine with the information the Cache Engine requires to operate correctly.

In order for the router to respond to the Cache Engine's messages, the router must be running WCCP. This protocol defines the messages that are used for communication between the Cache Engine and the router. If you do not enable the protocol on the router, the router cannot use any engines.

The Cache Engine can also operate in proxy mode, accepting requests from manually configured browsers, without WCCP enabled on the router. However, we do not recommend using the Cache Engine in proxy mode because its performance is optimized in transparent mode. See "Enabling Cache Support on the Router" and "Web Cache Control Protocol" for information on starting WCCP on the router.

What is the Home Router for a Cache Engine?

Because a Cache Engine works with a router, each Cache Engine belongs to a specific router. The router to which the Cache Engine belongs is called the engine's home router. Although the Cache Engine does not have to be on a subnet directly attached to the router in order for the router to be home to the engine, it is usually better for network traffic if the Cache Engine is kept close to the router.

In Figure 1-1, these are the home routers and their Cache Engines:

What is a Cache Farm?

Each Cache Engine is an independent unit: you can add or remove a Cache Engine from a network with little impact on other engines on the network. It is the home router that maintains awareness of the engines attached to it, and it is the home router that decides which Cache Engine is sent a specific web request.

Because the home router determines how each Cache Engine is used, you can attach more than one Cache Engine to a single router. All Cache Engines attached to a single router form a cache farm. These engines do not have to be on the same subnet: engines on different subnets can form a cache farm. For example, in Figure 1-1, these are the home routers and their cache farms:


Note The Cache Engines divide the Internet address space into 256 groups, and they tell the home router the addresses that each Cache Engine will cover. As Cache Engines are added to, or removed from, a cache farm, the engines dynamically redistribute these groups evenly across the engines in the farm. For example, each Cache Engine in a 2-Cache Engine farm would get 128 groups; if there are 3 engines, each would get 85 or 86 groups. This calculation is based on not just the first octet of the IP address, but on the full destination IP address.

Buckets ensure that most valid data in caches are not modified when the buckets are redistributed when adding and/or removing cache engines.

What is a Cache Hierarchy?

Every home router redirects web traffic to its attached Cache Engines, and the engines handle the network traffic between users and web servers. Because each home router redirects its web traffic, if the web traffic goes through more than one home router, more than one cache farm examines a user's web request. The relationship between these cache farms forms a cache hierarchy.

For example, in Figure 1-1, if user PC-2 requests a page from the Internet web server www.cisco.com, the request follows this path:

    1. Home router 3 redirects the request to a Cache Engine in farm F2. (This assumes that web traffic is redirected on the interface between router 3 and router 2.)

    2. If the Cache Engine in F2 has a copy of the page, it returns the page to PC-2, and is finished processing the request.

    3. Otherwise, the Cache Engine in F2 sends the request back to router 3, which sends it to router 2.

    4. Router 2 sends the request directly to router 1, because router 2 does not have a cache farm.

    5. Router 1 sends the request to an Cache Engine in farm F1. (This assumes that web traffic is redirected on the interface between router 1 and the Internet.)

    6. If the Cache Engine in farm F1 has a copy of the page, it returns it to the Cache Engine in F2 that is handling PC-2's request. When the page reaches router 3, the router sends it to the Cache Engine in F2. The Cache Engine in F2 puts a copy of the page in its storage while sending the page on to PC-2.

    7. Otherwise, the Cache Engine in F1 sends the request back to router 1, which sends it on the Internet to eventually arrive at www.cisco.com. On the return trip, the page is sent to the Cache Engine in F1, which keeps a copy of the page in storage before sending it on to eventually reach the Cache Engine in F2, which also saves a copy of the page as it sends the page to PC-2.

In this example, the cache farms F1 and F2 form a hierarchy. First F2 is checked, then F1 is checked, then finally the actual web server is contacted for the requested page. However, there is not a strict relationship between farms in the hierarchy; the hierarchy is established by the relationship of the user to the eventual source of the page. For example, if user PC-3 tries to access a page on server 3, farm F1 is checked before farm F2: this is the reverse of the hierarchy for user PC-2. (This assumes that web traffic is redirected on the interfaces between router 1 and router 2, and router 3 and server 3.)

When you design your web caching system, keep in mind the hierarchical relationship between your cache farms. Try to attach the cache farms to the key routers that control your network, to ensure the widest benefit of web caching.


Note You may also used the Internet Cache Protocol (ICP), described in Chapter 3, to build cache hierarchies.

Are the Caches Ever Bypassed?

Because the Cache Engine works with a router, the Cache Engine only sees port 80 traffic that goes through the router. Thus, if a web request does not require routing through a home router, that request is not handled by a Cache Engine. For example, in Figure 1-1, if user PC-1 requests a page from server 1, that request does not get handled by any Cache Engine. PC-1 and server 1 are on the same subnet, so routing is not required for communication between these machines.

On the other hand, if user PC-3 requests a page from server 1, that request is handled by the engines in farm F1. Even though PC-3 and server 1 are on subnets attached to the same router, they are on different subnets, so routing through the router is required for communication. (This assumes that web traffic is redirected on the interface between router 1 and server 1.)

You can force traffic to bypass the caches by not redirecting web traffic on the interface connected to the web server. Thus, you could prevent the caching of internal web servers by not redirecting web traffic on the interfaces for the network segments that contain the servers.

What Does Not Get Cached?

The Cisco Cache Engine only caches data that uses TCP port 80. Therefore, if a web site is set up to use a different port, that web traffic does not go through the Cache Engines. For example, a secure web server normally does not use port 80, so data from a secure web server is normally never cached. In addition, the Cache Engine does not cache data that contains HTTP codes stating not to cache the object and it does not cache generated object data (for example, generated objects created by CGI scripts). The Cache Engine also does not cache authenticated data. Similarly, FTP traffic does not get cached.

You can further limit what gets cached by setting up the Cache Engines to disallow access to certain web sites (for example, sites that contain undesirable pictures). These web sites not only are not cached, but the Cache Engines prevent users from accessing them at all.

What Happens if a Cache Engine Stops Working?

A Cache Engine might stop working for any number of reasons, from hardware failure to network failure to power failure. If a Cache Engine no longer responds to the router, the router automatically stops directing traffic to the failing Cache Engine. If there are other engines attached to the router, the router continues using those engines, and reapportions the Internet address space evenly among the remaining engines. Otherwise, the router does not redirect web traffic.

Thus, a failing Cache Engine is mostly transparent to your users: the router automatically redirects traffic around the failed engine(s), so that the disappearance of the Cache Engine does not cause serious problems. However, no new connections are sent to the Cache Engine and current connections (up to 2000) are disabled when the Cache Engine fails.

How is Time-Sensitive Data Handled?

Some data that you retrieve from the web is time sensitive. For example, real-time stock quotes change from second to second. The server administrator can set caching parameters for the data on the server. If the server from which the data is retrieved is using the HTTP 1.1 protocol, the server administrator can explicitly identify how long the data should be cached. If the server is using the HTTP 1.0 protocol, the server administrator can identify the data as 'do not cache,' but cannot set specific expiration limits for the data. In either case, the Cache Engines do not cache data marked 'do not cache,' and follow the HTTP 1.1 parameters for explicitly set caching parameters.

If the server administrator has not set caching parameters otherwise, the data is cached. Users can ensure their data is fresh by clicking the web browser's Refresh button. When a user clicks Refresh, the Cache Engine also refreshes the data that is in the cache.

How Long Are Objects Stored?

A Cache Engine stores an object no longer than the HTTP 1.1 or 1.0 caching specifications require. For example, if a server administrator marks an object as expiring after a specific time, the Cache Engine must revalidate the object after that time.

For HTTP 1.0 objects, you can adjust how long the object is stored by using the Freshness settings described in "Working with Tuning Options".

The Cache Engine may delete an object before an expiration time or date is met if the Cache Engine runs out of storage for newly cached pages. Thus, how long a specific object stays in storage may depend as much on the quantity of data that is cached as it does on the specific caching parameters associated with the object, and this can change from day to day.

Planning a Caching Strategy

Before you install your Cache Engine, first develop a plan for how you want to use web caches. This helps ensure that you get the benefits you expect from the Cache Engine.

This section covers the issues you might consider when planning your caching strategy. Whether you address all of these issues is up to you: the Cache Engine does not depend on an installation's caching strategy for it to work properly. The issues addressed in this section only help you maximize the benefit of your investment in your Cache Engine.

In planning your caching strategy, address these questions and issues:

In addition, look through these sections to get ideas for how to deploy the Cache Engines in your network:

Which Routers Should be Home Routers for a Cache Engine?

A good place to start a cache farm is the router that contains your Internet connection. This ensures that all user requests get handled by a Cache Engine before going outside of your business. If you have more than one router that connects to the Internet, create a cache farm for each.

Other likely places for a cache farm are routers that connect remote offices to the main office. This allows your intranet web servers to be cached at the remote office, reducing the traffic on the lines connecting the remote office to your main network.

For Internet Service Providers (ISPs), placing a cache farm at the router in your Points of Presence (POPs) can help reduce the traffic between your main site and each POP.

In general, any router that connects users to another location through a slower line can benefit from caching.

Which Subnets Should Include a Cache Engine?

Because the router redirects all HTTP traffic to the Cache Engines, any network segment that contains the Cache Engines should experience an increase in network traffic. If all engines are on the same network segment, that segment's traffic increases by your current amount of HTTP traffic, plus whatever HTTP traffic the engines must themselves generate in order to accommodate requests for which they do not have cached copies of the page. For example, if half of your users' HTTP requests can be satisfied from a Cache Engine, the network segment's traffic should increase by 150% of your current HTTP traffic load.

Consider these recommendations when deciding where to place a Cache Engine. The optimal configuration is listed first and is also described in the section, "Sample Configurations" in "Web Cache Control Protocol."

  However, if you have a small network, this setup can perform acceptably (see "Cache Engines and Router on Same Network Interface" for an example of this setup).

How Many Cache Engines Do You Need for Each Home Router?

Each Cache Engine has a maximum number of concurrent sessions that it can handle. You can translate this maximum number of sessions into an expected number of users that the Cache Engine can support. The number of users that can be supported by one Cache Engine, then, depends on:

    1. The number of sessions opened by the web browser.

    2. The percentage of users you expect to be accessing the web at any given moment.

Different web browsers handle sessions differently: some browsers allow the user to set the number of sessions that it should use. A good starting point, however, is to assume 4 sessions will be used for each web page requested.

You may need to experiment with your user community to identify the peak usage periods and how much concurrent traffic they generate in order to create cache farms with sufficient processing power. Once you have a cache farm in place, you can use the Cache Engine's status and log pages to determine if the cache farm is an efficient size.

Using the status page (see "Savings"), look at the % utilization figures. If they are consistently high, between 90 and 100 percent, for all or most of the Cache Engines in the cache farm, the cache farm may be a bottleneck on the network. Similarly, the event log (see "Events") may show a high number of warnings for cache overutilization. If you consistently see very high utilization ratios, consider adding additional Cache Engines to the cache farm.

Reducing Web Traffic vs. Saving WAN Costs

The Cache Engine can accomplish two main goals for your network:

Although these goals are compatible, the relative importance of the goals can affect how you deploy the Cache Engines.

Consider the example in Figure 1-2. In this example, three remote offices are connected in a WAN with the main office. Web traffic from the remote offices must go through the main office before reaching the Internet. If there is a lot of web traffic coming from these remote offices, placing a cache farm at each remote office can reduce your WAN costs, because some of the traffic can be satisfied with data stored in the cache farm. The configuration pictured here also has the benefit of reducing overall traffic sent to the Internet.

If, however, you do not have much web traffic coming from the remote offices (for example, if the remote offices were sales offices where the employees are frequently on the road or using the telephone instead of the Internet), there may be no significant benefit in placing cache farms at the remote offices. However, you can still reduce web traffic to the Internet by placing a cache farm at the main office.

Carefully consider the type of network traffic generated at each of the offices in your network. Some offices may benefit from having a cache farm, whereas other offices might see only a marginal improvement in performance and cost savings.


Figure 1-2: Reducing Web Traffic vs. Saving WAN Costs


Examples of Internet Service Provider (ISP) Configurations

This section shows some recommended configurations for Internet Service Provider (ISP) systems. Although there are many ways in which you can deploy the Cache Engine, we recommend that you follow the examples in these sections:

Overview of Cache Farms in an ISP Network

Figure 1-3 shows a broad view of an ISP network. In an ISP network, you may have the dual goal of speeding web traffic (thus improving customer satisfaction) and reducing WAN costs. If that is your goal, then it is effective to place cache farms at each point in your network where there is a WAN connection to another site.

In this figure, you would place a cache farm at all places marked A, B, or C. As you go up in network size from POPs to your larger offices, increase the size of your cache farms. For example, you might place a single Cache Engine at your POPs (location A), but several Cache Engines at the B locations, and your largest cache farms at the C locations.


Figure 1-3:
Overview of Cache Farms in an ISP Network


Detailed View of Cache Farms in an ISP Network

Figure 1-4 shows a detailed view of part of an ISP network. This example shows the details of the POP network and its connection to a larger site. The cache farm connected to the POP router is on a separate network connection from the AS5300 machines. You must enable the WCCP on the POP router, and redirect web traffic on the interface connecting the POP to the main office.

Likewise, there are cache farms at each of the routers in the main office that accept traffic from the POPs. You must enable WCCP on each of these routers, and redirect the web traffic on the interface that is connected to the network in the main office (interfaces A, B, and C). Because there is a lot of traffic going through the router at the main office that is connected to the Internet, it is best not to attach a cache farm to that router. Keep the cache farms at the entry points to your network.


Figure 1-4: Detailed View of Cache Farms in a ISP Network


Examples of Enterprise Configurations

This section shows some recommended configurations for enterprise networks. Although there are many ways in which you can deploy the Cache Engine, we recommend that you follow these examples. From best configuration to least best, the recommendations are described in the following sections:

Choose an example appropriate for your existing network configuration.

Cache Engines on a Separate Network Interface

Because a cache farm increases the amount of traffic on the line to which the Cache Engines are attached, we recommend that you isolate the Cache Engines on a dedicated 100BaseT network interface. Place 3 to 6 Cache Engines on a single network interface. If you have more than 6 Cache Engines in a cache Farm, use a separate network interface for every 3 to 6 Cache Engines. This helps ensure that the network interface does not get overloaded during peak usage of the Internet.

Figure 1-5 shows a basic configuration using a single router. In this example, enable WCCP on the router, and redirect web traffic that goes out the network interface to the Internet. For more information, see the section "Sample Network Configuration 3" in "Web Cache Control Protocol."


Figure 1-5: Cache Engines on Separate Interface, Single Router


Figure 1-6 shows a more complex example, where the router has several switches attached to it. Each switch also can have Cache Engines attached to it. Ideally, these switches should also have a router blade (such as a Catalyst 5000 with a router blade). If the switch has a router blade, the traffic between router A and the switches is reduced, because you can configure the engines attached to the switches to use the router in the switches as the home router rather than router A. If router A is the home router, then requests must reach router A before being redirected back down the network to the engines attached to the switches.

In this example of switches with router blades, enable WCCP on the router blade in the switches, and redirect web traffic that goes out the interface to router A. You must also enable WCCP on router A, and redirect web traffic on the interface to the Internet.


Figure 1-6: Cache Engines on Separate Interface Using Router and Switches


Cache Engines on a Switch

If you have a small office setup, an ideal design is to attach the Cache Engines to a switch with a router blade, such as a Catalyst 5000 with a router blade.

Figure 1-7 shows this setup. You must enable WCCP on the router in the Catalyst 5000, and redirect web traffic on the interface connected to the Internet.


Figure 1-7: Cache Engines on a Switch with a Router Blade


Figure 1-8 shows a setup using a separate switch (one that does not have a router blade) and router. The line between the router and switch can become a bottleneck, because all web traffic must first reach the router before it can be redirected back down the network to the Cache Engines. This setup, however, creates less network traffic than a router-hub setup (where the switch in the diagram is replaced by a hub).


Figure 1-8: Cache Engines on a Switch with Separate Router


Cache Engines and Router on Same Network Interface

If you have a very small office, you can consider adding a router to your existing network that will act as the home router for the Cache Engines. This is sometimes referred to as a 'router on a stick.' Only use this setup if you have one Cache Engine and a reasonably small number of users, and your main router does not have Cisco IOS software with WCCP, because this setup increases the amount of traffic on your network.

Figure 1-9 shows an example of adding a home router to an existing network. In this example, you must enable WCCP on router 2, and redirect web traffic on the interface connecting to router 1 from router 2. Then, make router 2 the default gateway for all of the client machines on the network, and router 1 the default gateway for router 2. For additional information, see the section "Sample Configuration 1" in "Web Cache Control Protocol." Thus, all traffic on this network first goes through router 2 before going to router 1. Unless you have a small network, this setup may not improve your network performance. Router 2 can become a bottleneck if it does not have sufficient processing power to handle the traffic on your network.

Although this setup can improve performance on small networks, it is not an optimal setup. It breaks the rule of thumb that you should not put a Cache Engine on an interface whose web traffic is being redirected. In most cases, we recommend you use one of the setups described in "Cache Engines on a Separate Network Interface" or "Cache Engines on a Switch."


Figure 1-9: Cache Engines and Router on Same Network Interface (Router on a Stick)


Previewing Cache Engine Management

Once you install a Cache Engine and initially configure it using a console, you manage the Cache Engine through an ordinary web browser on any machine. See "Connecting to the Cache Engine Management Interface" in "Installing the Cache Engines." The Cache Engine's web interface allow you to adjust settings, monitor system performance, and take remedial actions. See "Managing the Cache Engine," for more details.

The web browser you use must be able to handle HTML forms and must be Java-enabled; for example, Netscape Navigator 2.0 or Microsoft Internet Explorer 3.0.


hometocprevnextglossaryfeedbacksearchhelp
Posted: Sat Sep 28 02:30:36 PDT 2002
All contents are Copyright © 1992--2002 Cisco Systems, Inc. All rights reserved.
Important Notices and Privacy Statement.