cc/td/doc/product/webscale/webcache/ce15
hometocprevnextglossaryfeedbacksearchhelp
PDF

Table of Contents

Planning a Web Caching System

Planning a Web Caching System

This chapter describes the Cisco Cache Engine and the requirements and considerations for using the engine. Before you install the Cisco Cache Engine, take time to carefully plan the placement of Cache Engines on your networks. This helps ensure that you receive the maximum benefit of caching web traffic.

This chapter covers these topics:

Introducing the Cisco Cache Engine

The Cisco Cache Engine works with a router running Cisco IOS software with the Web Cache Control Protocol. The Web Cache Control Protocol redirects HTTP traffic (web traffic, or to be specific, traffic that uses TCP port 80) to a Cache Engine; the Cache Engine then manages the web request.

Thus, the Cache Engine works in tandem with a router to handle web traffic. This traffic includes user requests to view pages and graphics on World Wide Web servers, whether internal or external to your network, and the replies to those requests.

When a user requests a page from a web server, the router first sends the request to a Cache Engine. If the Cache Engine has a copy of the requested page in storage, the engine sends the user that page. Otherwise, the engine gets the requested page and the objects on that page from the web server, stores a copy of the page and its objects (caches them), and simultaneously forwards the page and objects on to the user.

By caching web objects in this manner, the Cisco Cache Engine can speed the satisfaction of user requests if more than one user wants to access the same page. This also reduces the amount of traffic between your network and the Internet, potentially improving your overall network performance and reducing your bandwidth requirements.

Understanding Cache Engine Characteristics

Table 1-1 lists the characteristics of the Cache Engine.


Table 1-1: Cache Engine Characteristics
Characteristic Model CE2050

Cache storage

24 GB

Maximum number of concurrent TCP sessions

900

Dimensions

Height: 5.21 in (13.23 cm)

Width: 16.82 in (42.72 cm)

Depth: 17.5 in (44.45 cm)

Weight

32 pounds (14.5 kg)

Power

  • Auto-switching from low range: 90-135 VAC

  • Auto-switching from high range: 180-270 VAC

  • Frequency: 47-63 Hz

  • Maximum power: 253 Watts

Current

  • 115-VAC Input, Full Load: 4.2 A maximum

  • 230-VAC Input, Full Load: 0.5 A maximum

Console port

DB-9 connector

100BaseTX/10BaseT Ethernet autosensing port

RJ-45 connector

Temperature

Operating: 23 to 113°F (-5 to 45°C)

Non-operating: -13 to 158°F (-25 to 70°C)

Operating humidity

5 to 95%, noncondensing

The maximum number of concurrent TCP sessions for each engine limits the amount of traffic the engine handles at any given time. This limit ensures that the engine does not get overloaded and impair network performance. Once the limit is reached, users are returned a message indicating that the server is either busy or unreachable. If they try again, the number of active sessions may have reduced enough to allow the connection.

The number of users that this session limit can accommodate depends on several factors:

Understanding Cache Engine Hardware Requirements

In order to attach a Cache Engine to your network, your network must meet these requirements:

In addition, you must attach a console to the Cache Engine in order to initially configure the engine. This can be a normal VT100-style console, or a laptop or desktop computer that can attach to the engine's serial port. Once you have completed the basic configuration, you manage the Cache Engine using a web browser from any machine on your intranet.

A single router can support a cache farm of up to 32 Cache Engines.

Understanding What the Cisco Cache Engine Does

This section describes what the Cisco Cache Engine and the Web Cache Control Protocol do with web traffic, and explains the concepts you need to understand in order to plan an effective caching strategy. Figure 1-1 shows an example network containing several routers, Cache Engines, web servers, and users. Subsequent sections refer back to this graphic to illustrate the described concepts.


Figure 1-1: Network Containing Cache Engines

What Does the Cisco Cache Engine Do?

The Cisco Cache Engine works with a router to handle web traffic. When the router receives a request that uses TCP port 80 (typically HTTP, or web, traffic), the router transparently redirects the request to a Cisco Cache Engine. The Cache Engine then attempts to satisfy the request from its own storage.

If the requested page is already in storage, the Cache Engine returns the stored page to the user. As far as the user knows, the page came directly from the web server. By fulfilling the request from its own storage, the Cache Engine eliminates the need to send the request to the Internet and receive data from the Internet, thus freeing your Internet connection for other traffic. The user might also get the requested page more quickly than possible from the Internet.

If the Cache Engine does not already have the requested page in storage, the engine retrieves the page from the requested web server. While sending the request to the user, the engine caches the page (stores a copy of the page on its own disk drives), so that subsequent requests for the page, whether from this user or from another user, can be satisfied from the engine's storage.

Thus, all web traffic, whether going out to the Internet or coming back from the Internet, gets routed to the Cache Engine, and the engine manages the communication between the user and the Internet. Because the engine sees all web traffic, you can configure the engine to prevent users from accessing certain servers (for example, sites with undesirable photographs). You can also tell the engine to not cache certain types of objects (for example, Java applets).

What is the Web Cache Control Protocol?

In order for a router to use the Cache Engines, it must know that there is an attached Cache Engine. When you turn on a fully-configured Cache Engine, it announces to the router that it is up and ready to handle web traffic. The router, in turn, must respond to the Cache Engine with the information the engine requires to operate correctly.

In order for the router to respond to the Cache Engine's messages, the router must be running the Web Cache Control Protocol. This protocol defines the messages that are used for communication between the Cache Engine and the router. If you do not enable the protocol on the router, the router cannot use any attached engines.

See "Enable Cache Support on the Router" in Chapter 2 for information on starting the Web Cache Control Protocol on the router.

What is the Home Router for a Cache Engine?

Because a Cache Engine works with a router, each Cache Engine belongs to a specific router. The router to which the Cache Engine belongs is called the engine's home router. Although the Cache Engine does not have to be on a subnet directly attached to the router in order for the router to be home to the engine, it is usually better for network traffic if the engine is kept close to the router.

In Figure 1-1, these are the home routers and their Cache Engines:

What is a Cache Farm?

Each Cache Engine is an independent unit: you can add or remove a Cache Engine from a network with little impact on other engines on the network. It is the home router that maintains awareness of the engines attached to it, and it is the home router that decides which engine is sent a specific web request.

Because the home router determines how each Cache Engine is used, you can attach more than one engine to a single router. All Cache Engines attached to a single router form a cache farm. These engines do not have to be on the same subnet: engines on different subnets can form a cache farm. For example, in Figure 1-1, these are the home routers and their cache farms:


Note The Cache Engines divide the Internet address space into 256 groups, and they tell the home router the addresses that each engine will cover. As Cache Engines are added to, or removed from, a cache farm, the engines dynamically redistribute these groups evenly across the engines in the farm. For example, each engine in a 2-engine farm would get 128 groups; if there are 3 engines, each would get 85 or 86 groups.

What is a Cache Hierarchy?

Every home router redirects web traffic to its attached Cache Engines, and the engines handle the network traffic between users and web servers. Because each home router redirects its web traffic, if the web traffic goes through more than one home router, more than one cache farm examines a user's web request. The relationship between these cache farms forms a cache hierarchy.

For example, in Figure 1-1, if user PC-2 requests a page from the Internet web server www.cisco.com, the request follows this path:

    1. Home router 3 redirects the request to a Cache Engine in farm F2. (This assumes that web traffic is redirected on the interface between router 3 and router 2.)

    2. If the engine in F2 has a copy of the page, it returns the page to PC-2, and is finished processing the request.

    3. Otherwise, the engine in F2 sends the request back to router 3, which sends it to router 2.

    4. Router 2 sends the request directly to router 1, because router 2 does not have a cache farm.

    5. Router 1 sends the request to an engine in farm F1. (This assumes that web traffic is redirected on the interface between router 1 and the Internet.)

    6. If the engine in farm F1 has a copy of the page, it returns it to the engine in F2 that is handling PC-2's request. When the page reaches router 3, the router sends it to the engine in F2. The engine in F2 puts a copy of the page in its storage while sending the page on to PC-2.

    7. Otherwise, the engine in F1 sends the request back to router 1, which sends it on the Internet to eventually arrive at www.cisco.com. On the return trip, the page is sent to the engine in F1, which keeps a copy of the page in storage before sending it on to eventually reach the engine in F2, which also saves a copy of the page as it sends the page to PC-2.

In this example, the cache farms F1 and F2 form a hierarchy. First F2 is checked, then F1 is checked, then finally the actual web server is contacted for the requested page. However, there is not a strict relationship between farms in the hierarchy; the hierarchy is established by the relationship of the user to the eventual source of the page. For example, if user PC-3 tries to access a page on server 3, farm F1 is checked before farm F2: this is the reverse of the hierarchy for user PC-2. (This assumes that web traffic is redirected on the interfaces between router 1 and router 2, and router 3 and server 3.)

When you design your web caching system, keep in mind the hierarchical relationship between your cache farms. Try to attach the cache farms to the key routers that control your network, to ensure the widest benefit of web caching.

Are the Caches Ever Bypassed?

Because the Cache Engine works with a router, the engine only sees traffic that goes through the router.

Thus, if a web request does not require routing through a home router, that request is not handled by a Cache Engine.

For example, in Figure 1-1, if user PC-1 requests a page from server 1, that request does not get handled by any Cache Engine. PC-1 and server 1 are on the same subnet, so routing is not required for communication between these machines.

On the other hand, if user PC-3 requests a page from server 1, that request is handled by the engines in farm F1. Even though PC-3 and server 1 are on subnets attached to the same router, they are on different subnets, so routing through the router is required for communication. (This assumes that web traffic is redirected on the interface between router 1 and server 1.)

You can force traffic to bypass the caches by not redirecting web traffic on the interface connected to the web server. Thus, you could prevent the caching of internal web servers by not redirecting web traffic on the interfaces for the network segments that contain the servers.

What Does Not Get Cached?

The Cisco Cache Engine only caches data that uses TCP port 80. Therefore, if a web site is set up to use a different port, that web traffic does not go through the Cache Engines. For example, a secure web server normally does not use port 80, so data from a secure web server is normally never cached.

Similarly, FTP traffic does not get cached.

You can further limit what gets cached by:

What Happens if a Cache Engine Stops Working?

A Cache Engine might stop working for any number of reasons, from hardware failure to network failure to power failure. If a Cache Engine no longer responds to the router, the router automatically stops directing traffic to the failing Cache Engine. If there are other engines attached to the router, the router continues using those engines, and reapportions the Internet address space evenly among the remaining engines. Otherwise, the router does not redirect web traffic.

Thus, a failing Cache Engine is transparent to your users: the router automatically redirects traffic around the failing engine, so that the disappearance of the engine does not cause serious problems.

How is Time-Sensitive Data Handled?

Some data that you retrieve from the web is time sensitive. For example, real-time stock quotes change from second to second. The server administrator can set caching parameters for the data on the server. If the server from which the data is retrieved is using the HTTP 1.1 protocol, the server administrator can explicitly identify how long the data should be cached. If the server is using the HTTP 1.0 protocol, the server administrator can identify the data as 'do not cache,' but cannot set specific expiration limits for the data. In either case, the Cache Engines do not cache data marked 'do not cache,' and follow the HTTP 1.1 parameters for explicitly set caching parameters.

If the server administrator has not set caching parameters otherwise, the data is cached. Users can ensure their data is fresh by clicking the web browser's Refresh button. When a user clicks Refresh, the Cache Engine also refreshes the data that is in the cache.

How Long Are Pages and Objects Stored?

A Cache Engine stores a web page or object no longer than the HTTP 1.1 or 1.0 caching specifications require. For example, if a server administrator marks a page as expiring after a specific time, the Cache Engine deletes the page from storage when that time has expired.

For HTTP 1.0 objects, you can adjust how long the object is kept in storage by using the Freshness Factors in the Nerd Knobs. These freshness factors do not apply to HTTP 1.1 objects, however, because server administrators using HTTP 1.1 have extensive control over caching characteristics, whereas HTTP 1.0 provides only limited control.

The Cache Engine may delete a page or object before an expiration time or date is met if the engine runs out of storage for newly cached pages. Thus, how long a specific object stays in storage depends as much on the quantity of data that is cached as it does on the specific caching parameters associated with the object, and this can change from day to day.

Planning a Caching Strategy

Before you install your Cache Engine, first develop a plan for how you want to use web caches on your intranet. This helps ensure that you get the benefits you expect from the Cache Engine.

This section covers the issues you might consider when planning your caching strategy. Whether you address all of these issues is up to you: the Cache Engine does not depend on an installation's caching strategy for it to work properly. The issues addressed in this section only help you maximize the benefit of your investment in your Cache Engine.

In planning your caching strategy, address these questions and issues:

In addition, look through these sections to get ideas for how to deploy the Cache Engines in your network:

Which Routers Should be Home Routers for a Cache Engine?

A good place to start a cache farm is the router that contains your Internet connection. This ensures that all user requests get handled by a Cache Engine before going outside of your business. If you have more than one router that connects to the Internet, create a cache farm for each.

Other likely places for a cache farm are routers that connect remote offices to the main office. This allows your intranet web servers to be cached at the remote office, reducing the traffic on the lines connecting the remote office to your main network.

For Internet Service Providers (ISPs), placing a cache farm at the router in your Points of Presence (POPs) can help reduce the traffic between your main site and each POP.

In general, any router that connects users to another location through a slower line can benefit from caching.

Which Subnets Should Include a Cache Engine?

Because the router redirects all HTTP traffic to the Cache Engines, any network segment that contains the Cache Engines should experience an increase in network traffic. If all engines are on the same network segment, that segment's traffic increases by your current amount of HTTP traffic, plus whatever HTTP traffic the engines must themselves generate in order to accommodate requests for which they do not have cached copies of the page. For example, if half of your users' HTTP requests can be satisfied from a Cache Engine, the network segment's traffic should increase by 150% of your current HTTP traffic load.

Consider these recommendations when deciding where to place a Cache Engine:

How Many Cache Engines Do You Need for Each Home Router?

Each Cache Engine has a maximum number of concurrent sessions that it can handle. You can translate this maximum number of sessions into an expected number of users that the engine can support. The number of users that can be supported by one Cache Engine, then, depends on:

    1. The number of sessions opened by the web browser.

    2. The percentage of users you expect to be accessing the web at any given moment.

Different web browsers handle sessions differently: some browsers allow the user to set the number of sessions that it should use, while others try to open as many sessions as there are objects on a requested web page. A good starting point, however, is to assume 4 sessions will be used for each web page requested.

If you estimate that 10% of your users will be accessing the web concurrently, you can determine the number of users by dividing the session limit for the Cache Engine (S) by the sum of the number of sessions per browser (B) and the percentage of concurrent users (C):

Thus, if you assume 10% of your users are concurrently using the web, and their browsers open an average of 4 sessions, one CE2050 Cache Engine should handle 2250 users (900/(4*.1)). If only 5% of the users concurrently use the web, one CE2050 should handle 4500 users (900/(4*.05)).

You may need to experiment with your user community to identify the peak usage periods and how much concurrent traffic they generate in order to create cache farms with sufficient processing power. Once you have a cache farm in place, you can use the Cache Engine's status and log pages to determine if the cache farm is an efficient size.

Using the status page (see "Monitoring Cache Engine Performance" in Chapter 3), look at the % utilization figures. If they are consistently high, between 90 and 100 percent, for all or most of the Cache Engines in the cache farm, the cache farm may be a bottleneck on the network. Similarly, the event log (see "Reading the Event Log" in Chapter 3) may show a high number of warnings for cache overutilization. If you consistently see very high utilization ratios, consider adding additional Cache Engines to the cache farm.

Reducing Web Traffic vs. Saving WAN Costs

The Cache Engine can accomplish two main goals for your network:

Although these goals are compatible, the relative importance of the goals can affect how you deploy the Cache Engines.

Consider the example in Figure 1-2. In this example, three remote offices are connected in a WAN with the main office. Web traffic from the remote offices must go through the main office before reaching the Internet. If there is a lot of web traffic coming from these remote offices, placing a cache farm at each remote office can reduce your WAN costs, because some of the traffic can be satisfied with data stored in the cache farm. The configuration pictured here also has the benefit of reducing overall traffic sent to the Internet.

If, however, you do not have much web traffic coming from the remote offices (for example, if the remote offices were sales offices where the employees are frequently on the road or using the telephone instead of the Internet), there may be no significant benefit in placing cache farms at the remote offices. However, you can still reduce web traffic to the Internet by placing a cache farm at the main office.

Carefully consider the type of network traffic generated at each of the offices in your network. Some offices may benefit from having a cache farm, whereas other offices might see only a marginal improvement in performance and cost savings.


Figure 1-2: Reducing Web Traffic vs. Saving WAN Costs

Examples of Internet Service Provider (ISP) Configurations

This section shows some recommended configurations for Internet Service Provider (ISP) systems. Although there are many ways in which you can deploy the Cache Engine, we recommend that you follow these examples:

Overview of Cache Farms in an ISP Network

Figure 1-3 shows a broad view of an ISP network. In an ISP network, you may have the dual goal of speeding web traffic (thus improving customer satisfaction) and reducing WAN costs. If that is your goal, then it is effective to place cache farms at each point in your network where there is a WAN connection to another site.


Figure 1-3: Overview of Cache Farms in an ISP Network

In this figure, you would place a cache farm at all places marked A, B, or C. As you go up in network size from POPs to your larger offices, increase the size of your cache farms. For example, you might place a single Cache Engine at your POPs (location A), but several Cache Engines at the B locations, and your largest cache farms at the C locations.

Detailed View of Cache Farms in an ISP Network

Figure 1-4 shows a detailed view of part of an ISP network. This example shows the details of the POP network and its connection to a larger site. The cache farm connected to the POP router is on a separate network connection from the AS5200 machines. You must enable the Web Cache Control Protocol on the POP router, and redirect web traffic on the interface connecting the POP to the main office.

Likewise, there are cache farms at each of the routers in the main office that accept traffic from the POPs. You must enable the Web Cache Control Protocol on each of these routers, and redirect the web traffic on the interface that is connected to the network in the main office (interfaces A, B, and C). Because there is a lot of traffic going through the router at the main office that is connected to the Internet, it is best not to attach a cache farm to that router. Keep the cache farms at the entry points to your network.


Figure 1-4: Detailed View of Cache Farms in a ISP Network

Examples of Enterprise Configurations

This section shows some recommended configurations for enterprise networks. Although there are many ways in which you can deploy the Cache Engine, we recommend that you follow these examples. From best configuration to least best, the recommendations are:

    1. Cache Engines on a Separate Network Interface

    2. Cache Engines on a Switch

    3. Cache Engines and Router on Same Network Interface

Choose an example appropriate for your existing network configuration.

Cache Engines on a Separate Network Interface

Because a cache farm increases the amount of traffic on the line to which the Cache Engines are attached, we recommend that you isolate the Cache Engines on a dedicated 100BaseT network interface. Place 3 to 6 Cache Engines on a single network interface. If you have more than 6 Cache Engines in a cache Farm, use a separate network interface for every 3 to 6 Cache Engines. This helps ensure that the network interface does not get overloaded during peak usage of the Internet.

Figure 1-5 shows a basic configuration using a single router. In this example, enable the Web Cache Control Protocol on the router, and redirect web traffic that goes out the network interface to the Internet.


Figure 1-5: Cache Engines on Separate Interface, Single Router

Figure 1-6 shows a more complex example, where the router has several switches attached to it. Each switch also can have Cache Engines attached to it. Ideally, these switches should also have a router blade (such as a Catalyst 5000 with a router blade). If the switch has a router blade, the traffic between router A and the switches is reduced, because you can configure the engines attached to the switches to use the router in the switches as the home router rather than router A. If router A is the home router, then requests must reach router A before being redirected back down the network to the engines attached to the switches.

In this example of switches with router blades, enable the Web Cache Control Protocol on the router blade in the switches, and redirect web traffic that goes out the interface to router A. You must also enable the Web Cache Control Protocol on router A, and redirect web traffic on the interface to the Internet.


Figure 1-6: Cache Engines on Separate Interface Using Router and Switches

Cache Engines on a Switch

If you have a small office setup, an ideal design is to attach the Cache Engines to a switch with a router blade, such as a Catalyst 5000 with a router blade.

Figure 1-7 shows this setup. You must enable the Web Cache Control Protocol on the router in the Catalyst 5000, and redirect web traffic on the interface connected to the Internet.


Figure 1-7: Cache Engines on a Switch with a Router Blade

Figure 1-8 shows a setup using a separate switch (one that does not have a router blade) and router. The line between the router and switch can become a bottleneck, because all web traffic must first reach the router before it can be redirected back down the network to the Cache Engines. This setup, however, creates less network traffic than a router-hub setup (where the switch in the diagram is replaced by a hub).


Figure 1-8: Cache Engines on a Switch with Separate Router

Cache Engines and Router on Same Network Interface

If you have a very small office, you can consider adding a router to your existing network that will act as the home router for the Cache Engines. This is sometimes referred to as a 'router on a stick.' Only use this setup if you have one Cache Engine and a reasonably small number of users, and your main router does not have Cisco IOS software with the Web Cache Control Protocol, because this setup increases the amount of traffic on your network.

Figure 1-9 shows an example of adding a home router to an existing network. In this example, you must enable the Web Cache Control Protocol on router 2, and redirect web traffic on the interface connecting to router 1 from router 2. Then, make router 2 the default gateway for all of the client machines on the network, and router 1 the default gateway for router 2.


Figure 1-9: Cache Engines and Router on Same Network Interface (Router on a Stick)

Thus, all traffic on this network first goes through router 2 before going to router 1. Unless you have a small network, this setup may not improve your network performance. Router 2 can become a bottleneck if it does not have sufficient processing power to handle the traffic on your network.

Although this setup can improve performance on small networks, it is not an optimal setup. It breaks the rule of thumb that you should not put a Cache Engine on an interface whose web traffic is being redirected. In most cases, we recommend you use one of the setups described in "Cache Engines on a Separate Network Interface" or "Cache Engines on a Switch."

Overview of Cache Engine Management

Once you install a Cache Engine and initially configure it using a console, you manage the Cache Engine through an ordinary web browser on any machine on your intranet. The Cache Engine's web pages allow you to adjust settings, monitor system performance, and take remedial actions. The pages also include the complete text of this guide, so that you always have access to complete information for the machine.

The web browser you use must be able to handle HTML forms and must be Java-enabled; for example, Netscape Navigator 2.0 or Microsoft Internet Explorer 3.0.

Connecting to the Cache Engine Management Interface

To connect to the Cache Engine's management interface:


Step 1   Start a web browser on a machine that has access to the network on which the Cache Engine resides.

Step 2   Open the URL for the engine, using the URL given to you during basic configuration (as described in "Initialize the Cache Engine Configuration" in Chapter 2). For example, http://192.168.20.121:8001 . You must include the port number in the URL. You are prompted for a user name and password.

Step 3   Enter a correct user name and password. The Cache Engine returns the initial management page, which contains links to other management pages.

If you forget your password, you must have another administrator reset your password, or use the admin account. If you cannot use another account to reset your password, you must reset the password from the Cache Engine console. See "Updating the Basic Configuration" in Chapter 3.

Using the Management Interface

Use the buttons at the top and the bottom of the management pages to move between the various pages. These buttons provide access to user Accounts, system Status, the event Logs, the URL Filters, and the Nerd Knobs.

The pages in the management interface are divided into groups of related parameters. To change a parameter within a group, enter the desired value of the parameter, and click the action button within the group.

For example, Figure 1-10 shows an example of the Create New User group on the Accounts page. In this figure, the parameters are:

The action button is Create, and is in the lower right corner of the group. If you can set a parameter in a group, the action button for the group is in this lower right corner of the group. If the group only contains display information, there is no action button. Click the ? button in the upper right corner of the group to get help on the group.


Figure 1-10: Example of Management Interface (Create New User Group)

Understanding the Scope of Cache Engine Parameters

To simplify the administration of a cache farm, several of the operating parameters that you set through the management interface for a Cache Engine apply to all machines in the cache farm to which the Cache Engine belongs. Thus, when you need to change settings, you only have to log in to one Cache Engine in the cache farm.

Changing these parameters changes the settings for all machines in a cache farm:

All other parameters only apply to the specific machine to which you are logged in. If you want to change those parameters, you must connect to each Cache Engine in turn.


hometocprevnextglossaryfeedbacksearchhelp
Posted: Sat Sep 28 01:12:04 PDT 2002
All contents are Copyright © 1992--2002 Cisco Systems, Inc. All rights reserved.
Important Notices and Privacy Statement.