 | |  |
2.2. The World Wide Web
These days, the World Wide Web has become so popular that many people
think it is the Internet. If you aren't on the Web, you
aren't anybody. Unfortunately, although the Web is based
primarily on a single protocol (HTTP), web sites often use a wide
variety of protocols, downloadable code, and plug-ins, which have a
wide variety of security implications. It has become impossible to
reliably configure a browser so that you can always read everything
on every web site; it has always been insecure to do so.
Many people confuse the
functions and origins of the Web, Netscape, Microsoft Internet
Explorer, HTTP, and HTML, and the terminology used to refer to these
distinct entities has become muddy. Some of the muddiness was
introduced intentionally; web browsers attempt to provide a seamless
interface to a wide variety of information through a wide variety of
mechanisms, and blurring the distinctions makes it easier to use, if
more difficult to comprehend. Here is a quick summary of what the
individual entities are about:
- The Web
- The collection of HTTP servers (see the description of HTTP that follows) on the Internet. The Web is responsible, in large part, for
the recent explosion in Internet activity. It is based on concepts
developed at the European Particle Physics Laboratory (CERN) in
Geneva, Switzerland, by Tim Berners-Lee and others. Much of the
ground-breaking work on web clients was done at the National Center
for Supercomputing Applications (NCSA) at the University of Illinois
in Urbana-Champaign. Many organizations and individuals are
developing web client and server software these days, and many more
are using these technologies for a huge range of purposes. The
Internet Engineering Task Force (IETF) is currently responsible for
maintaining the HTTP standard, and the World Wide Web Consortium
(W3C) is developing successors to HTML (see Appendix A, "Resources", for more information about these organizations). Nobody "controls" the Web, however, much as nobody "controls" the Internet.
- HTTP
- The primary application protocol that underlies the Web: it provides users access to the files that make up the Web. These files might be
in many different formats (text, graphics, audio, video, etc.), but
the format used to provide the links between files on the Web is the
HyperText Markup Language (HTML).
- HTML
- A standardized page description language for creating web pages. It provides basic document-formatting capabilities (including the
ability to include graphics) and allows you to specify hypertext
links to other servers and files.
- Netscape Navigator and Microsoft Internet Explorer
- Commonly known as "Netscape" and "Explorer", these commercial
products are web browsers (they let you read documents via HTTP and
other protocols). There are hundreds of other web browsers, including
Lynx, Opera, Slurp, Go!Zilla, and perlWWW, but most estimates show
that the vast majority of web users are using Netscape or Explorer.
HTTP is only one protocol used by web browsers; web browsers
typically also can use at least the FTP, NNTP, SMTP, and POP
protocols. Some of them also can use other protocols like WAIS,
Gopher, and IMAP. Thus, when users say "we want Explorer"
or "we want Netscape", what they really mean, from a
protocol level, is that they want access to the HTTP servers that
make up the Web, and probably to associated servers running other
protocols that the web browsers can use (for instance, FTP, SMTP,
and/or NNTP).
2.2.1. Web Client Security Issues
Web browsers are fantastically popular
and for good reason. They provide a rich graphical interface to an
immense number of Internet resources. Information and services that
were unavailable or expert-only before are now easily accessible. In
Silicon Valley, you can use the Web to have dinner delivered without
leaving your computer except to answer the door. It's hard to
get a feel for the Web without experiencing it; it covers the full
range of everything you can do with a computer, from the mundane to
the sublime with a major side trip into the ridiculous.
Unfortunately, web browsers and servers are hard to secure. The
usefulness of the Web is in large part based on its flexibility, but
that flexibility makes control difficult. Just as it's easier
to transfer and execute the right program from a web browser than
from FTP, it's easier to transfer and execute a malicious one.
Web browsers depend on external programs, generically called
viewers (even if they play sounds instead of
showing pictures), to deal with data types that the browsers
themselves don't understand. (The browsers generally understand
basic data types such as HTML, plain text, and JPEG and GIF
graphics.) Netscape and Explorer now support a mechanism (designed to
replace external viewers) that allows third parties to produce
plug-ins that can be downloaded to become an
integrated and seamless extension to the web browser. You should be
very careful about which viewers and plug-ins you configure or
download; you don't want something that can do dangerous things
because it's going to be running on your computers, as if it
were one of your users, taking commands from an external source. You
also want to warn users not to download plug-ins, add viewers, or
change viewer configurations, based on advice from
strangers.
In addition, most browsers also
understand one or more extension systems ( Java , JavaScript,
or ActiveX, for instance). These systems make the browsers more
powerful and more flexible, but they also introduce new problems.
Whereas HTML is primarily a text-formatting language, with a few
extensions for hypertext linking, the extension systems provide many
more capabilities; they can do anything you can do with a traditional
programming language. Their designers recognize that this creates
security problems. Traditionally, when you get a new program you know
that you are receiving a program, and you know where it came from and
whether you trust it. If you buy a program at a computer store, you
know that the company that produced it had to go to the trouble of
printing up the packaging and convincing the computer store to buy it
and put it up for sale. This is probably too much trouble for an
attacker to go to, and it leaves a trail that's hard to cover
up. If you decide to download a program, you don't have as much
evidence about it, but you have some. If a program arrives on your
machine invisibly when you decide to look at something else, you have
almost no information about where it came from and what sort of trust
you should give it.
The designers of JavaScript, VBScript,
Java, and ActiveX took different approaches to this problem.
JavaScript and VBScript are simply supposed to be unable to do
anything dangerous; the languages do not have commands for writing
files, for instance, or general-purpose extension mechanisms. Java
uses what's called a "sandbox" approach. Java does
contain commands that could be dangerous, and general-purpose
extension mechanisms, but the Java interpreter is supposed to prevent
an untrusted program from doing anything unfortunate, or at least ask
you before it does anything dangerous. For instance, a Java program
running inside the sandbox cannot write or read files without
notification. Unfortunately, there have been implementation problems
with Java, and various ways have been found to do operations that are
supposed to be impossible.
In any case, a program that can't do anything dangerous has
difficulty doing anything interesting. Children get tired of playing
in a sandbox relatively young, and so do programmers.
ActiveX, instead of trying to limit a
program's abilities, tries to make sure that you know where the
program comes from and can simply avoid running programs you
don't trust. This is done via digital signatures; before an
ActiveX program runs, a browser will display signature information
that identifies the provider of the program, and you can decide
whether or not you trust that provider. Unfortunately, it is
difficult to make good decisions about whether or not to trust a
program with nothing more than the name of the program's
source. Is "Jeff's Software Hut" trustworthy? Can
you be sure that the program you got from them doesn't send
them all the data on your hard disk?
As time goes by, people are providing newer, more flexible models of
security that allow you to indicate different levels of trust for
different sources. New versions of Java are introducing digital
signatures and allowing you to decide that programs with specific
signatures can do specific unsafe operations. Similarly, new versions
of ActiveX are allowing you to limit which ActiveX operations are
available to programs. There is a long way to go before the two
models come together, and there will be real problems even then. Even
if you don't have to decide to trust Jeff's Software Hut
completely or not at all, you still have to make a decision about
what level of trust to give them, and you still won't have much
data to make it with. What if Jeff's Software Hut is a vendor
you've worked with for years, and suddenly something comes
around from Jeff's Software House? Is that the same people,
upgrading their image, or is that somebody using their reputation?
Because programs in extension systems are generally embedded inside
HTML documents, it is difficult for firewalls to filter them out
without introducing other problems. For further discussion of
extension systems, see Chapter 15, "The World Wide Web".
Because an HTML document can easily link to documents on other
servers, it's easy for people to become confused about exactly
who is responsible for a given document. "Frames" (where
the external web page takes up only part of the display) are
particularly bad in this respect. New users may not notice when they
go from internal documents at your site to external ones. This has
two unfortunate consequences. First, they may trust external
documents inappropriately (because they think they're internal
documents). Second, they may blame the internal web maintainers for
the sins of the world. People who understand the Web tend to find
this hard to believe, but it's a common misconception:
it's the dark side of having a very smooth transition between
sites. Take care to educate users, and attempt to make clear what
data is internal and what data is external.
2.2.2. Web Server Security Issues
When
you run a web server, you are allowing anybody who can reach your
machine to send commands to it. If the web server is configured to
provide only HTML files, the commands it will obey are quite limited.
However, they may still be more than you'd expect; for
instance, many people assume that people can't see files unless
there are explicit links to them, which is generally false. You
should assume that if the web server program is capable of reading a
file, it is capable of providing that file to a remote user. Files
that should not be public should at least be protected by file
permissions, and should, if possible, be placed outside of the web
server's accessible area (preferably by moving them off the
machine altogether).
ost web servers, however, provide services beyond merely handing out
HTML files. For instance, many of them come with administrative
servers, allowing you to reconfigure the server itself from a web
browser. If you can configure the server from a web browser, so can
anybody else who can reach it; be sure to do the initial
configuration in a trusted environment. If you are building or
installing a web server, be sure to read the installation
instructions. It is worthwhile checking the security resources
mentioned in Appendix A, "Resources", for problems.
Web servers can also call external programs in a variety of ways. You
can get external programs from vendors, either as programs that will
run separately or as plug-ins that will run as part of the web
server, and you can write your own programs in a variety of different
languages and using a variety of different tools. These programs are
relatively easy to write but very difficult to secure, because they
can receive arbitrary commands from external people. You should treat
all programs run from the web server, no matter who wrote them or
what they're called, with the same caution you would treat a
new server of any kind. The web server does not provide any
significant protection to these programs. A large number of
third-party server extensions originally ship with security flaws,
generally caused by the assumption that input to them is always going
to come from well-behaved forms. This is not a safe assumption; there
is no guarantee that people are going to use your forms and your web
pages to access your web server. They can send any data they like to
it.
A number of software (and hardware) products are now appearing with
embedded web servers that provide a convenient graphical
configuration interface. These products should be carefully
configured if they are running on systems that can be accessed by
outsiders. In general, their default configurations are
insecure.
 |  |  | 2. Internet Services |  | 2.3. Electronic Mail and News |
|
|