[Chapter 20] 20.4 The Data-Tainting Security Model

20.4 The Data-Tainting Security Model

The security model adopted by Navigator 2.0 and 3.0 is functional, but suffers from a number of problems. As we've seen, the "identify and hobble" approach is not very good at identifying security holes in the first place, and in a complex system like Navigator plus JavaScript, security holes can be difficult to find. Furthermore, hobbling JavaScript reduces the functionality available to developers. Some hobbles, while essential for security, end up breaking perfectly good scripts that pose no security threat and that ran correctly on earlier versions of the browser.

The hobble that prevents one script from reading the contents of a window from another server is a particularly draconian example. This hobble means that I cannot write a debugger program in JavaScript and post it on my web site for other developers to use on their own JavaScript programs. Developers would have to go through the extra step of downloading the debugging script and installing it on their own site, so that it can successfully examine the properties of the documents to be debugged. Similarly, this hobble prevents the creation of JavaScript programs that "crawl" the Web, recursively following links from a given starting page.

Because of the problems with hobbles, and with the theoretical underpinnings of security through hobbling, the developers at Netscape have created an entirely new security model. This new model is experimental in Navigator 3.0, and may be enabled by the end user through a procedure outlined later in this section. The new security model is theoretically much stronger, and should be a big advance for JavaScript security if it is enabled by default in Navigator 4.0. The following subsections explain this new model. Be aware in advance that this is a confusing model and can be difficult to understand.

Data Tainting in Theory

Let's back up a bit and reconsider the security problem we are worried about in the first place. For the most part, the problem is that private data may be sent across the Web by malicious JavaScript programs. The hobbling approach to security generally patches this problem by preventing JavaScript programs from accessing private data. Unfortunately, this approach rules out non-malicious JavaScript programs that would like to use that private data without exporting it. One such program, for example, might be a navigation aid that generates a list of all the links from a web page and displays them in a separate window or frame.

Instead of preventing scripts from reading private data, a better approach would be to prevent them from exporting it, since this is what we are trying to prevent in the first place. If we could do this, then we could lift most of the hobbles that were detailed in the sections above. (We'd still need some hobbles, to prevent a program from closing windows it didn't open, for example.) Unfortunately, preventing the export of private data can be tricky to do, because not only must we prevent a script from exporting private data directly, be we must also prevent it from exporting data derived, in any way, from private data. If you think through the implications, you can see that keeping track of the data that must not be exported could be a very difficult proposition.

This is where the concept of data tainting comes in. The idea is that all JavaScript data values are given a flag. This flag indicates if the value is "tainted" (private) or not. Tainted values will be allowed to be exported only in certain very restricted ways. Untainted values can be exported arbitrarily. But any value, regardless of taint, can be manipulated by the program, which is a big improvement over the heavy-handed measures required by the hobbling approach. As the term "tainted" implies, any data derived from tainted data will itself be tainted. If a tainted string is added to a non-tainted string, the resulting string is tainted. If a tainted value is passed to a function, then the return value of the function is tainted. If a string is tainted, then any substring of the string is also tainted.

Theoretically, the data-tainting model is a strong one, and it has been proven practical in the Perl programming language. With a careful and rigorous implementation of tainting, Navigator will be able to prevent private data, or any modified version of private data from being incorrectly exported by a JavaScript program. Because data tainting is a uniform security model that covers all possible exports of data, we can also trust its security much further than we would trust the "identify a hole and patch it with a hobble" model.

Data Tainting in JavaScript

To really understand the data-tainting security model in JavaScript, you must understand what the taint flag indicates. In fact, this "flag" is better described as an "accumulator" because there are many possible types of taint, and any value can be tainted in more than one way. Entries in the history array, for example, are tainted in a way that indicates "this is private data and must not be exported in any way." On the other hand, in a document loaded from server.xyz.com data values in an HTML form are tainted in a way that indicates "this data belongs to server.xyz.com, and it must not be exported anywhere except to that server". When taint propagates from a tainted value to a derived value, this meaning propagates with it, of course.

As we can see, tainting does not prevent all tainted data from being exported; it merely prevents it from being exported to a server that does not already "own" it. Furthermore, tainting does not even absolutely prevent data from being sent where it shouldn't be; it only prevents it from automatically being sent there. Whenever an attempt to export data violates the tainting rules, the user will be prompted with a dialog box asking them whether the export should be allowed. If they so choose, they can allow the export.

Consider how this might work. If a malicious script tries to export the URLs contained in the History object, JavaScript will see that these values are tainted in a way that does not allow them to be exported in any way, and will not allow the export. On the other hand, when a web page contains an HTML form, the user input values will be tainted in such a way that allows them to be exported back to the server form which the form was loaded. But if a malicious script running in another window attempts to spy on that HTML form and makes copies of the user's input, those copied values will still carry a taint value that identifies them as belonging to their original server. If the malicious scripts attempts to export them to its own malicious server, the attempt will fail because the taint values indicate that that server does not own that data.

It is not only data values that can carry taint. JavaScript functions and methods can carry taint as well. If a function or method is tainted, then its return value will automatically be tainted, regardless of the taintedness of its arguments. For example, the toString() method of the Location object and of the Text and Textarea objects are tainted because these methods return data that is private.

Functions are actually just another datatype in JavaScript, so it is not surprising that they can carry taint. What is surprising is that JavaScript programs themselves can become tainted. If a tainted value is used in an expression that is tested as part of an if, while or for statement, then the script itself must carry taint. If not, it would be easy to "launder" taint from a value with code like the following:

// b is a tainted Boolean value that we want to export
if (b == true) newb = true;
else newb = false;
// Now newb has the same value as b, but is not tainted, so we could 
// export it if this script itself did not become tainted in the process.

When a script becomes tainted, the window that contains it "accumulates" the same taint values, with the same meanings, that data values do. If a window carries taint, it will not be allowed to export data to a server unless the script's taint code and the data's taint code both indicate that they belong to the server.

In addition to understanding the different types of taint that are possible, you should also understand just what is meant by "exporting" data. In general terms, this means sending data over the Net. In practical terms, it occurs when a form is submitted in any way, or when a new URL is requested in any way. It is obvious that form submission exports data, but is less obvious that requesting a new document exports data. Bear in mind though that arbitrary data can be encoded into a URL following a question mark or hash sign (#). Also, the file and path of a URL can encode information.

While the data-tainting model is relatively straightforward on the surface, a working implementation requires careful attention to detail. JavaScript propagates taint through the strings of code passed to the eval() and setTimeout() functions, for example, so that you cannot untaint a value simply by converting it to a string of JavaScript code and executing that code later. Similarly, JavaScript propagates taint through the document.write() method so that a script can't launder tainted values by writing them out into a new script in a new window. For the same reason, JavaScript propagates taint through javascript: URLs, and prevents tainted strings from being stored in cookies. JavaScript also prevents data from being laundered through LiveConnect. In Navigator 3.0, this happens in a heavy-handed way: all data retrieved from Java is automatically tainted.

Enabling Data Tainting in Navigator 3.0

As noted above, the data-tainting security model is experimental in Navigator 3.0, and is not enabled by default. It is expected to be the default security model in version 4.0 of Navigator, however. If you want to try using data tainting with Navigator 3.0, you must enable it by setting an environment variable before starting Navigator. On Unix systems, do this with the following command in csh:

setenv NS_ENABLE_TAINT 1

On Windows platforms, enable taint with a set command in the autoexec.bat file or in NT user settings:

set NS_ENABLE_TAINT=1

And on the Macintosh, use the resource editor to edit the resource with type "Envi" and number 128 in the Netscape application. Modify this resource by removing the two slashes (//) before the NS_ENABLE_TAINT at the end of the string.

Note that if you enable this security model, you may find that many more scripts than you expect produce taint violations, and you'll spend a lot of time responding to dialogs that ask you to confirm form submissions or new page requests. One of the main reasons that tainting was not enabled in Navigator 3.0 was that the user interface to support it well was not yet ready. Thus, for Navigator 4.0, we can hope to see a smoother UI that does not ask as many questions.

Values Tainted by Default

Table 20.1 lists the object properties and methods that are tainted by default. The taint() and untaint() functions that will be introduced below allow you to modify these defaults.

Table 20.1: JavaScript Properties and Methods That Are Tainted by Default
Object	Tainted Properties and Methods
Document	`cookie`, `domain`, `forms[]`, `lastModified`, `links[]`, `location`, `referrer`, `title`, `URL`
Form	`action`
All Form input elements: Button, Checkbox, FileUpload, Hidden, Password, Radio, Reset, Select, Submit, Text, Textarea	`checked`, `defaultChecked`, `defaultValue`, `name`, `selectedIndex`, `toString()`, `value`
History	`current`, `next`, `previous`, `toString()`, all array elements[1]
Location, Link, Area	`hash`, `host`, `hostname`, `href`, `pathname`, `port`, `protocol`, `search`, `toString()`
Option	`defaultSelected`, `selected`, `text`, `value`
Window	`defaultStatus`, `status`
Footnotes: [1] Note that History properties belong to the browser, not the server, and thus have a different taint value.

The taint() and untaint() Functions

Table 20.1 shows the object properties and methods that are tainted by default in Navigator 3.0. This list is not the final word on tainting. If a script would like to prevent other data it owns from being exported, it may taint that data with the taint() method. Similarly, if a script would like to relax the data-tainting rules in order to allow information it owns to be exported more freely, it can remove its taint from a value with the untaint() method.

There are some important things to note about these functions. First, both taint() and untaint() return a tainted or untainted copy of primitive vales or a tainted or untainted reference to objects and arrays. In JavaScript, taint is carried by references to objects, not by the objects themselves. So when you untaint an object, what you are really doing is untainting a reference to that object, not the object itself. The object's value may be exported through the untainted reference but not through the tainted reference.

The second point to note is that a script can use untaint() only to remove its own taint from a value. If a value X carries taint that identifies it as owned by server A, then a script running in a document from server B may call untaint() on value X but will not succeed in removing server A's taint, and will not be able to export that value to server B.

Finally, if taint() and untaint() are called with no argument, then they add and remove taint from the script rather than from a particular object. Again, a script can only remove its own taint from itself: if a script from server A has tainted itself by examining tainted data owned by server B, then server A cannot remove that taint from itself.