12.5. ScalabilityMoving a web site from one machine serving a few test requests to an industrial-strength site capable of serving the full flood of web demand may not be a simple matter. 12.5.1. PerformanceA busy site will have performance issues, which boil down to the question: "Are we serving the maximum number of customers at the minimum cost?" 12.5.1.1. ToolsYou can see how resources are being used under Unix from the utilities: top, vmstat, swapinfo, iostat, and their friends. (See Essential System Administration, by Aeleen Frisch [O'Reilly, 2002].) 12.5.1.2. Apache's mod_infomod_info can be used to monitor and diagnose processes that deal with HTTPD. See Chapter 10. 12.5.1.3. BandwidthYour own hardware may be working wonderfully, but it's being strangled by bandwidth limitations between you and the Web backbone. You should be able to make rough estimates of the bandwidth you need by multiplying the number of transactions per second by the number of bytes transferred (making allowance for the substantial HTTP headers that go with each web page). Having done that, check what is actually happening by using a utility like ipfm from http://www.via.ecp.fr/~tibob/ipfm/: HOST IN OUT TOTAL host1.domain.com 12345 6666684 6679029 host2.domain.com 1232314 12345 1244659 host3.domain.com 6645632 123 6645755 ... Or use cricket (http://cricket.sourceforge.net/) to produce pretty graphs. 12.5.1.4. Load balancingmod_backhand is free software for load balancing, covered later in this chapter. For expensive software look for ServerIron, BigIP, LoadDirector, on the Web. 12.5.1.5. Image server, text serverThe amount of RAM at your disposal limits the number of copies of Apache (as httpd or httpsd) that you can run, and that limits the number of simultaneous clients you can serve. You can reduce the size of some of the httpd instances by having a cutdown version for images, PDF files, or text while running a big version for scripts. What normally makes the difference in size is the necessity to load a scripting language such as Perl or PHP into httpd. Because these provide persistent storage of modules and variables between requests, they tend to consume far more RAM than servers that only serve static pages and images. The normal answer is to run two copies of Apache, one for the static stuff and one for the scripts. Each copy has to bind to a different IP and port combination, of course, and usually the number of instances of the dynamic one has to be limited to avoid thrashing. 12.5.2. Shared Versus Replicated DBsYou may want to speed up database accesses by replicating your database across several machines so that they can serve clients independently. Replication is easy if the data is static, i.e., catalogs, texts, libraries of images, etc. Replication is hard if the database is often updated as it would be with active clients. However, you can sidestep replication by dividing your client database into chunks (for instance, by surname: A-D, E-G,...etc.), each served by a single machine. To increase speed, you divide it smaller and add more hardware. Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|