2.2. Filesystem Differences
We'll start with a quick review of the native filesystems for each of our target operating systems. Some of this may be old news to you, especially if you have significant experience with a particular operating system. Still, it is worth your while to pay careful attention to the differences between the filesystems (especially the ones you don't know) if you intend to write Perl code that works on multiple platforms.
All modern Unix variants ship with a native filesystem with semantics that resemble those of their common ancestor, the Berkeley Fast File System. Different vendors have extended their filesystem implementations in different ways (e.g., Solaris adds Access Control Lists for better security, Digital Unix ships a spiffy transaction-based filesystem called advfs, etc.). We'll be writing code aimed at the lowest common denominator to allow it to work across different Unix platforms.
The top, or root, of a Unix filesystem is indicated by a forward slash (/). To uniquely identify a file or directory in a Unix filesystem, we construct a path starting with a slash and then add directories, separating them with forward slashes, as we descend deeper into the filesystem. The final component of this path is the desired directory or filename. Directory and filenames in modern Unix variants are case sensitive. Almost all ASCII characters can be used in these names if you are crafty enough, but sticking to alphanumeric characters and some limited punctuation will save you hassle later.
2.2.2. Microsoft Windows NT/2000
Windows NT (Version 4.0 as of this writing) ships with two supported filesystems: File Allocation Table (FAT) and NT FileSystem (NTFS). Windows 2000 adds FAT32, an improved version of FAT that allows for larger partitions and smaller cluster sizes to the NT family.
Windows NT uses an extended version of the basic FAT filesystems found in DOS. Before we look at the extended version, it is important to understand the foibles of the basic FAT filesystem. In basic or real-mode FAT filesystems, filenames conform to the 8.3 specification. This means that file and directory names can start with no more than eight characters, must have a period (or dot as it is spoken), and are followed by a suffix of up to three characters in length. Unlike Unix, where a period in a filename has no special meaning, basic FAT filesystems can only use a single period as an enforced separator between the filename and its extension or suffix.
Real-mode FAT was later enhanced in a version called VFAT or protected-mode FAT. This is roughly the version that Windows NT and Windows 2000 support. VFAT hides all of the name restrictions from the user. Longer filenames without separators are provided by a very creative hack. VFAT uses a chain of standard file/directory name slots to transparently shoehorn extended filename support into the basic FAT filesystem structure. For compatibility, every file and directory name can still be accessed using a special 8.3-conforming DOS alias. For instance, the directory called Downloaded Program Files is also available as DOWNLO~1.
There are four key differences between a VFAT and a Unix filesystem:
FAT32 and NTFS filesystems have the same semantics as VFAT. They share the same support for long filenames and use the same root designator. NTFS is slightly more sophisticated in its name support because it allows these names to be specified using Unicode. Unicode is a multibyte character encoding scheme that can be used to represent all of the characters of all of the written languages on the planet.
NTFS also has some functional differences that distinguish it from the other Windows NT/2000 and basic Unix filesystems. NTFS supports the notion of an Access Control List (ACL). ACLs provide a fine-grained permission mechanism for file and directory access. Later on in this chapter we will write some code to take advantage of some of these differences.
Before we move on to another operating system, it is important to at least mention the Universal Naming Convention. UNC is a convention for locating things (files and directories in our case) in a networked environment. Instead of the drive letter and a colon preceding an absolute path, the drive letter: part is replaced with \\server\sharename. This convention suffers from the same Perl backslash syntax clash we saw a moment ago. As a result, it is not uncommon to see a set of leaning toothpicks like this:
$path = "\\\\server\\sharename\\directory\\file"
Despite its GUI-centrist approach, the MacOS Hierarchical File System (HFS) also lets users specify textual pathnames, albeit with a few twists. Absolute pathnames are specified using the following form: Drive/Volume Name:Folder:Folder:Folder:FileName. A specification with no colons refers to a file in the current directory.
Unlike the two previous operating systems, HFS paths are considered absolute if they do not begin with their path separator (:). An HFS path that begins with a colon is a relative path. One subtle difference that sets MacOS paths apart from the other operating systems is the number of separators you need to use when pointing to objects higher up in the directory hierarchy. For instance, under Unix, you would use .. /.. /.. /FileName to get to a file three levels higher than the current directory. Under MacOS, you would use four separators (i.e., ::::FileName), because you must include a reference to the current directory in addition to the three previous levels.
File and directory names are limited to 31 characters under HFS. As of MacOS Version 8.1, an alternative volume format called MacOS Extended Format or HFS+ was introduced to allow for 255 Unicode character filenames. Although the HFS+ filesystem allows these long names, MacOS does not yet support them as of this writing.
A more significant departure from the previous two operating systems (at least from a Perl programming point of view) is MacOS's use of the "fork" idiom for its file storage. Each file is said to have a data fork and a resource fork. The former holds the data part of the file, while the latter contains a variety of different resources. These resources can include executable code (in the case of a program), user interface specifications (dialog boxes, fonts, etc.), or any other components a programmer wishes to define. Though we won't be dealing with forks per se this chapter, MacPerl does have facilities for reading and writing to both forks.
Each file in the HFS filesystem also has two special tags, creator and type, that allow the OS to identify which application created that file and what kind of file it is purported to be. These tags play the same role as extensions used in the FAT filesystem (e.g., .doc or .exe). Later in this chapter we'll briefly show how to use the type/creator tags to your advantage.
2.2.4. Filesystem Differences Summary
Table 2-1 summarizes all of the differences we just discussed along with a few more items of interest.
Table 2.1. Filesystem Comparison
2.2.5. Dealing with Filesystem Differences from Perl
Perl can help you write code that takes most of these filesystem quirks into account. It ships with a module called File::Spec to hide some of the differences between the filesystems. For instance, if we pass in the components of a path to the catfile method like so:
use File::Spec; $path = File::Spec->catfile("home","cindy","docs","resume.doc");
then $path is set to home\cindy\docs\resume.doc on a Windows NT/2000 system, while on a Unix system it becomes home/cindy/docs/resume.doc, and so on. File::Spec also has methods like curdir and updir that return the punctuation necessary to describe the current and parent directories (e.g., "." and ".."). The methods in this module give you an abstract way to construct and manipulate your path specifications. If you'd prefer not to have to write your code using an object-oriented syntax, the module File::Spec::Functions provides a more direct route to the methods found in File::Spec.
Copyright © 2001 O'Reilly & Associates. All rights reserved.