17.1 Software Management Systems

A software management system is a set of tools and procedures for keeping track of which versions of which software you have installed, and whether any local changes have been made to the software or its configuration files. Without such a system, it is impossible to know whether a piece of software needs to be updated or what local changes have been made and need to be preserved after the update. Using some software management system to keep up to date is essential for security purposes, and useful for non-security upgrades as well.

Fortunately, nearly all Unix systems provide some form of software management for the core components of the operating system and the applications distributed with it. The most common approaches involve using management packages—precompiled executables and supporting files—and managing the software source code from which executables can be compiled and installed.

Mirror, Mirror

Whether you use packages or source code, you need to get the files from somewhere. Vendors typically make their applications available on the Internet—through the Web or through an anonymous FTP site. With very popular operating systems or applications, however, a single web site or FTP site often can't keep up with the demand to download it, so many software vendors arrange to have other sites serve as mirrors for their site. Users are encouraged to download the software from the mirror site closest (in network geography) to them. In principle, all of the software on the vendor's site is replicated to each mirror site on a regular (often daily) basis.

Mirror sites provide an important security benefit: they make the availability of software more reliable through redundancy. On the other hand, mirror sites also create some security concerns:

The administrators of the mirror site control their local copies of the software, and may have the ability to corrupt it, replace it with a Trojaned version, and so on. You must trust not only the vendor but also the administrators of the mirror site.
If the vendor distributes digital signatures along with the software (for example, detached PGP signatures with source code archives, or gnupg signatures in rpm files), you will have added confidence that you're receiving the software as released by the vendor, as long as you acquire the vendor's public key directly—not through the mirror! Some update systems automatically check signatures before an update is applied.

Note that several software vendors distribute MD5 checksums along with their software packages. MD5 checksums are useful for ensuring that the file was downloaded correctly. But MD5 checksums distributed with a program will not protect you from hostile code because an attacker who can replace the software package with a Trojaned version may also be able to replace the MD5 checksum with a checksum that will match the Trojan.
Even if you trust the mirror, daily updating may not be fast enough. If a critical security patch is released, you may not have time to wait 24 hours for your local mirror to be updated. In these cases, there is no substitute for downloading the patch directly from the vendor as soon as possible.

Using a mirror site is thus a trade-off between the convenience of being able to get a high-speed download when you want it, and of possibly extending your trust to a third party.

17.1.1 Package-Based Systems

A typical package file is a file containing a set of executable programs, already compiled, along with any supporting files such as libraries, default configuration files, and documentation. Under most packaging systems, the package also contains some metadata, such as:

Version information for the software it contains
Information about compatible operating system versions or hardware architectures
Lists of other packages that the package requires
Lists of other packages with which the package conflicts
Lists of which included files are configuration files (or are otherwise likely to be changed by users once installed)
Commands to run before, during, or after the included files are installed

The other important component of a package-based system is a database containing information about which versions of which packages have been installed on the system.

Package-based systems are easy to use: with a simple command or two, a system administrator can install new software or upgrade her current software when a new or patched version is released. Because the packaged executables are already compiled for the target operating system and hardware platform, the administrator doesn't have to spend time building (and maybe even porting) the application.

On the other hand, packages are compiled to work on the typical installation of the operating system, and not necessarily on your installation. If you need to tune your applications to work with some special piece of hardware, adapt them to an unusual authentication system, or simply compile them with an atypical configuration setting, source code will likely be more useful to you. This is often the case with the kernel, for example.

Commercial Unix distributions that don't provide source code are obvious candidates for package-based management. For example, Solaris 2.x provides the pkgadd, pkgrm, pkginfo, and showrev commands (and others) for adding, removing, and querying packages from the shell, and admintool for managing software graphically.

Package management isn't only for commercial Unix. Free software Unix distributions also provide package management systems to make it easier for system administrators to keep the system up to date. Several Linux distributions have adopted the RPM Package Manager (RPM) system.^[3] This system uses a single command, rpm, for all of its package management functions. Debian GNU/Linux uses an alternative package management system called dpkg. The BSD-based Unix systems focus on source-based updates, but also provide a collection of precompiled packages that are managed with the pkg_add, pkg_delete, and pkg_info commands.

^[3] In its early days, RPM stood for "Red Hat Package Manager," but the name has since been changed to reflect its popularity on other distributions of Linux as well.

17.1.2 Source-Based Systems

In contrast to package-based systems, source-based systems focus on helping the system administrator maintain an up-to-date copy of the operating system's or application's source code, from which new executables can be compiled and installed. Source-based management has its own special convenience: a source-based update comes in only a single version, as opposed to compiled packages, which must be separately compiled and packaged for each architecture or operating system on which the software runs. Source-based systems can also be particularly useful when it's necessary to make local source code changes.

From a security standpoint, building packages from source code can be a mixed blessing. On the one hand, you are free to inspect the source code and determine if there are any lurking bugs or Trojan horses. In practice, such inspection is difficult and rarely done,^[4] but the option exists. On the other hand, if an attacker can get access to your source code, it is not terribly difficult for the attacker to add a Trojan horse of her own! To avoid this problem, you need to be sure both that the source code you are compiling is for a reliable system and that you have the genuine source code.^[5]

^[4] Moreover, source inspection is rarely done correctly! Knowing how to program is not the same as knowing how to audit code for security problems.

^[5] Even so, there are an increasing number of cases in which source code distribution was successfully attacked and a Trojan horse was incorporated into code that was subsequently distributed. Ironically, both cases involved security-related software. In one case, a Trojan horse was incorporated into the source code for the tcpwrappers suite of programs. In another case, a Trojan horse was incorporated into the makefiles that build OpenSSH.

17.1.2.1 Source code and patches

The simplest approach to source management is to keep application source code available on the system and recompile it whenever it's changed. Most Unix systems use the /usr/src and /usr/local/src hierarchies to store source code to distributed and third-party software, respectively. When a patch to an application is released, it typically takes the form of a patch diff, a file that describes which lines in the old version should be changed, removed, or added to in order to produce the new version. The diff program produces these files, and the patch program is used to apply them to an old version to create the new version. After patching the source code, the system administrator recompiles and reinstalls the application.

Source Packages

Although most users of Linux distributions that use the rpm or dpkg package management systems may never use one, both systems support packages containing source code rather than precompiled executables. Source packages install themselves in a special location, and include metadata that describes how to automatically compile and install the application from the source.

For example, FreeBSD and related versions of Unix distribute many applications in their ports collection. An application in the ports collection consists of the original source code from the application's author, along with a set of patches that have been applied to better integrate the application into the BSD environment. The makefiles included in the ports system automatically build the application, install it, and then register the application's files with the BSD pkg_add command.

This approach is widely used for maintaining third-party software on FreeBSD systems.

17.1.2.2 CVS

Another approach to source management is to store the source code on a server using a source code versioning system such as the Concurrent Versions System (CVS), and configure the server to allow anonymous client connections. Users who want to update their source code to the latest release use the CVS program to "check out" the latest patched version from the remote server's repository. The updated code can then be compiled and installed.

An advantage of CVS is that the system makes it easy for sites to maintain their own local modifications to an otherwise large and unwieldy system. CVS will detect the local modifications and reapply them each time a new version of the source code is downloaded.

FreeBSD, NetBSD, and OpenBSD use CVS to distribute and maintain their core operating system software. In addition, tens of thousands of open source software projects maintain CVS servers of their own, or are hosted at sites such as sourceforge.net that provide CVS respositories.