System Features (Running Linux, 4th Edition)

1.4. System Features

Linux supports most of the features found in other implementations of Unix, plus quite a few not found elsewhere. This section provides a nickel tour of the Linux kernel features.

1.4.1. A Note on Linux Version Numbers

One potentially confusing aspect of Linux for newcomers is the way in which different pieces of software are assigned a version number. When you first approach Linux, chances are you'll be looking at a CD-ROM distribution, such as "Red Hat Version 7.1" or "SuSE Linux Version 6.0." It's important to understand that these version numbers only relate to the particular distribution (which is a prepackaged version of Linux along with tons of free application packages, usually sold on CD-ROM). Therefore, the version number assigned by Red Hat, SuSE, or Debian might not have anything to do with the individual version numbers of the software in that distribution.

The Linux kernel, as well as each application, component, library, or software package in a Linux distribution, generally has its own version number. For example, you might be using gcc Version 2.96, as well as the XFree86 GUI Version 4.0.3. As you can guess, the higher the version number, the newer the software is. If you install Linux in the form of a distribution (such as Red Hat and SuSE), all of this is simplified for you since the latest versions of each package are usually included in the distribution, and the distribution vendors make sure that the software on a particular distribution works together.

The Linux kernel has a peculiar version numbering scheme with which you should be familiar. As mentioned before, the kernel is the core operating system itself, responsible for managing all the hardware resources in your machine — such as disks, network interfaces, memory, and so on. Unlike Windows systems, the Linux kernel doesn't include any application-level libraries or GUIs. In some sense, as a user you will never interact with the kernel directly, but rather through interfaces, such as the shell or the GUI (more on this later).

However, many people still consider the Linux kernel version to be the version of the "entire system," which is somewhat misleading. Someone might say, "I'm running kernel Version 2.5.12," but this doesn't mean much if everything else on the system is years out of date.

The Linux kernel versioning system works as follows. At any given time, there are two "latest" versions of the kernel out there (meaning available for download from the Internet) — the "stable" and "development" releases. The stable release is meant for most Linux users who aren't interested in hacking on bleeding-edge experimental features, but who need a stable, working system that isn't changing underneath them from day to day. The development release, on the other hand, changes very rapidly as new features are added and tested by developers across the Internet. Changes to the stable release consist mostly of bug fixes and security patches, while changes to the development release can be anything from major new kernel subsystems to minor tweaks in a device driver for added performance. The Linux developers don't guarantee that the development kernel version will work for everyone, but they do maintain the stable version with the intention of making it run well everywhere.

The stable kernel release has an even minor version number (such as 2.4), while the development release has an odd minor version number (such as 2.5). Note that the current development kernel always has a minor version number exactly one greater than the current stable kernel. So, when the current stable kernel is 2.6, the current development kernel will be 2.7. (Unless, of course, Linus decides to bump the kernel version to 3.0 — in which case the development version will be 3.1, naturally).

Each kernel version has a third "patch-level" version number associated with it, such as 2.4.19 or 2.5.12. The patch level specifies the particular revision of that kernel version, with higher numbers specifying newer revisions. As of the time of this writing in November 2002, the latest stable kernel is 2.4.19 and the current development kernel is 2.5.45.

1.4.2. A Bag of Features

Linux is a complete multitasking, multiuser operating system (as are all other versions of Unix). This means that many users can be logged onto the same machine at once, running multiple programs simultaneously. Linux also supports multiprocessor systems (such as dual-Pentium motherboards), with support for up to 32 processors in a system, which is great for high-performance servers and scientific applications.

The Linux system is mostly compatible with a number of Unix standards (inasmuch as Unix has standards) on the source level, including IEEE POSIX.1, System V, and BSD features. Linux was developed with source portability in mind: therefore, you will probably find features in the Linux system that are shared across multiple Unix implementations. A great deal of free Unix software available on the Internet and elsewhere compiles on Linux out of the box.

If you have some Unix background, you may be interested in some other specific internal features of Linux, including POSIX job control (used by shells such as the C shell, csh, and bash), pseudoterminals (pty devices), and support for national or customized keyboards using dynamically loadable keyboard drivers. Linux also supports virtual consoles, which allow you to switch between multiple login sessions from the system console in text mode. Users of the screen program will find the Linux virtual console implementation familiar (although nearly all users make use of a GUI desktop instead).

Linux can quite happily coexist on a system that has other operating systems installed, such as Windows 95/98, Windows NT/2000/XP, Mac OS, or other versions of Unix. The Linux bootloader (LILO) and the GRand Unified Bootloader (GRUB) allow you to select which operating system to start at boot time, and Linux is compatible with other bootloaders as well (such as the one found in Windows 2000).

Linux can run on a wide range of CPU architectures, including the Intel x86 (the whole Pentium line including the '386/'486), Itanium, SPARC/UltraSPARC, ARM, PA-RISC, Alpha, PowerPC, MIPS, m68k, and IBM 370/390 mainframes. Linux has also been ported to a number of embedded processors, and stripped-down versions have been built for various PDAs, including the PalmPilot and Compaq iPaq. In the other direction, Linux is being considered for top-of-the-line computers as well. In April 2002, Hewlett-Packard announced that it was going to release a supercomputer with Linux as the operating system. A large number of scalable clusters — supercomputers built out of arrays of PCs — run Linux as well.

Linux supports various filesystem types for storing data. Some filesystems, such as the Second Extended Filesystem (ext2fs), have been developed specifically for Linux. Other Unix filesystem types, such as the Minix-1 and Xenix filesystems, are also supported. The Windows NTFS (Windows 2000 and NT), VFAT (Windows 95/98), and FAT (MS-DOS) filesystems have been implemented as well, allowing you to access Windows files directly. Support is included for Macintosh, OS/2, and Amiga filesystems as well. The ISO 9660 CD-ROM filesystem type, which reads all standard formats of CD-ROMs, is also supported. We'll talk more about filesystems in Chapter 3 and Chapter 5.

Networking support is one of the greatest strengths of Linux, both in terms of functionality and performance. Linux provides a complete implementation of TCP/IP networking. This includes device drivers for many popular Ethernet cards, PPP and SLIP (allowing you to access a TCP/IP network via a serial connection or modem), Parallel Line Internet Protocol (PLIP), and the NFS. Linux also supports the modern IPv6 protocol suite, and many other protocols including DHCP, Appletalk, IRDA, DECnet, and even AX.25 for packet radio networks. The complete range of TCP/IP clients and services is supported, such as FTP, Telnet, NNTP, and Simple Mail Transfer Protocol (SMTP). The Linux kernel includes complete network firewall support, allowing you to configure any Linux machine as a firewall (which screens network packets, preventing unauthorized access to an intranet, for example). It is widely held that networking performance under Linux is superior to other operating systems. We'll talk more about networking in Chapter 15.

1.4.3. Kernel

The kernel is the guts of the operating system itself; it's the code that controls the interface between user programs and hardware devices, the scheduling of processes to achieve multitasking, and many other aspects of the system. The kernel is not a separate process running on the system. Instead, you can think of the kernel as a set of routines, constantly in memory, to which every process has access. Kernel routines can be called in a number of ways. One direct method to utilize the kernel is for a process to execute a system call, which is a function that causes the kernel to execute some code on behalf of the process. For example, the read system call will read data from a file descriptor. To the programmer, this looks like any other C function, but in actuality the code for read is contained within the kernel.

Kernel code is also executed in other situations. For example, when a hardware device issues an interrupt, the interrupt handler is found within the kernel. When a process takes an action that requires it to wait for results, the kernel steps in and puts the process to sleep, scheduling another process in its place. Similarly, the kernel switches control between processes rapidly, using the clock interrupt (and other means) to trigger a switch from one process to another. This is basically how multitasking is accomplished.

The Linux kernel is known as a monolithic kernel, in that all core functions and device drivers are part of the kernel proper. Some operating systems employ a microkernel architecture whereby device drivers and other components (such as filesystems and memory management code) are not part of the kernel — rather, they are treated as independent services or regular user applications. There are advantages and disadvantages to both designs: the monolithic architecture is more common among Unix implementations and is the design employed by classic kernel designs, such as System V and BSD. Linux does support loadable device drivers (which can be loaded and unloaded from memory through user commands); this is the subject of Section 7.5 in Chapter 7.

The Linux kernel on Intel platforms is developed to use the special protected-mode features of the Intel x86 processors (starting with the 80386 and moving on up to the current Pentium 4). In particular, Linux makes use of the protected-mode descriptor-based memory management paradigm and many of the other advanced features of these processors. Anyone familiar with x86 protected-mode programming knows that this chip was designed for a multitasking system such as Unix (the x86 was actually inspired by Multics). Linux exploits this functionality.

Like most modern operating systems, Linux is a multiprocessor operating system: it supports systems with more than one CPU on the motherboard. This feature allows different programs to run on different CPUs at the same time (or "in parallel"). Linux also supports threads, a common programming technique that allows a single program to create multiple "threads of control" that share data in memory. Linux supports several kernel-level and user-level thread packages, and Linux's kernel threads run on multiple CPUs, taking advantage of true hardware parallelism. The Linux kernel threads package is compliant with the POSIX 1003.1c standard.

The Linux kernel supports demand-paged loaded executables. That is, only those segments of a program that are actually used are read into memory from disk. Also, if multiple instances of a program are running at once, only one copy of the program code will be in memory. Executables use dynamically linked shared libraries, meaning that executables share common library code in a single library file found on disk. This allows executable files to occupy much less space on disk. This also means that a single copy of the library code is held in memory at one time, thus reducing overall memory usage. There are also statically linked libraries for those who wish to maintain "complete" executables without the need for shared libraries to be in place. Because Linux shared libraries are dynamically linked at runtime, programmers can replace modules of the libraries with their own routines.

In order to make the best use of the system's memory, Linux implements so-called virtual memory with disk paging. That is, a certain amount of swap space [1] can be allocated on disk. When applications require more physical memory than is actually installed in the machine, it will swap inactive pages of memory out to disk. (A page is simply the unit of memory allocation used by the operating system; on most architectures, it's equivalent to 4 KB.) When those pages are accessed again, they will be read from disk back into main memory. This feature allows the system to run larger applications and support more users at once. Of course, swap is no substitute for physical RAM; it's much slower to read pages from disk than from memory.

[1]If you are a real OS geek, you will note that swap space is inappropriately named: entire processes are not swapped, but rather individual pages of memory are paged out. While in some cases entire processes will be swapped out, this is not generally the case. The term "swap space" originates from the early days of Linux and should technically be called "paging space."

The Linux kernel keeps portions of recently accessed files in memory, to avoid accessing the (relatively slow) disk any more than necessary. The kernel uses all the free memory in the system for caching disk accesses, so when the system is lightly loaded a large number of files can be accessed rapidly from memory. When user applications require a greater amount of physical memory, the size of the disk cache is reduced. In this way physical memory is never left unused.

To facilitate debugging, the Linux kernel generates a core dump of a program that performs an illegal operation, such as accessing an invalid memory location. The core dump, which appears as a file called core in the directory that the program was running, allows the programmer to determine the cause of the crash. We'll talk about the use of core dumps for debugging in the section Section 14.1.2 in Chapter 14.