1.6.1. Directories
Let's review the most important concepts about directories.
The fact that directories can contain other directories leads
to a hierarchical structure, more popularly
known as a tree, for all files on a Unix system.
Figure 1-2
shows part of a typical directory tree;
ovals are regular files and rectangles are directories.
Figure 1-2. A tree of directories and files
The top of the tree is a directory called "root"
that has no name on the system.[7]
All files can be named by expressing their location on the
system relative to root; such names are built by listing
all the directory names (in order from root), separated
by slashes (/), followed by the file's name. This way
of naming files is called a full (or absolute)
pathname.
For example, say
there is a file called memo in the directory
fred, which is in the directory home, which
is in the root directory. This file's full pathname
is /home/fred/memo.
1.6.1.1. The working directory
Of course, it's annoying to have to use full pathnames
whenever you need to specify a file, so there is also the
concept of the working directory
(sometimes called the current directory), which is the
directory you are "in" at any given time. If you give a
pathname with no leading slash, the
location of the file is worked out relative
to the working directory. Such pathnames are called
relative pathnames; you'll use them much more often
than full pathnames.
When you log in to the system, your working directory is
initially set to a special directory called your home
(or login) directory. System administrators often
set up the system so that everyone's home directory name
is the same as their login name, and all home directories
are contained in a common directory under root.
It is now common practice to use /home as the
top directory for home directories.
For example, /home/billr is a typical home directory.
If this is your working directory
and you give the command lp memo, the system looks
for the file memo in /home/billr. If you have
a directory called bob in your home directory, and it
contains the file statrpt, you can print statrpt with
the command lp bob/statrpt.
1.6.1.2. Tilde notation
As you can well imagine, home directories occur often in
pathnames. Although many systems are organized so that all
home directories have a common parent (such as /home),
you should not have to rely on that being
the case, nor should you even have to know what the absolute
pathname of someone's home directory is.
Therefore, the Korn shell has a way of abbreviating home
directories: just precede the name of the user with a tilde (~).
For example, you could refer to the file memo in user
fred's home directory as ~fred/memo.
This is an absolute pathname, so it doesn't matter
what your working directory is when you use it. If fred's home
directory has a subdirectory called bob and the file
is in there instead, you can use ~fred/bob/memo as its name.
Even more conveniently, a tilde by itself refers to your own home
directory. You can refer to a file called notes in your home
directory as ~/notes (note the
difference between that and ~notes, which the shell would
try to interpret as user notes's
home directory). If notes is in your bob subdirectory,
you can call it ~/bob/notes. This notation is
handiest when your working directory is not in your home directory
tree, e.g., when it's some "system" directory like /tmp.
1.6.1.3. Changing working directories
If you want to change your working directory, use the command cd.
If you don't remember your working directory, the command
pwd tells the shell to print it.
cd takes as argument the name of the directory you
want to become your working directory. It can be relative
to your current directory, it can contain a tilde, or it can
be absolute (starting with
a slash). If you omit the argument, cd changes to your
home directory (i.e., it's the same as cd ~).
Table 1-1
gives some sample cd commands. Each command
assumes that your working directory is /home/billr
just before the command is executed, and that your directory structure looks like
Figure 1-2.
Table 1-1. Sample cd commands
Command |
New working directory |
cd bob |
/home/billr/bob |
cd bob/dave |
/home/billr/bob/dave |
cd ~/bob/dave |
/home/billr/bob/dave |
cd /usr/lib |
/usr/lib |
cd .. |
/home |
cd ../pete |
/home/pete |
cd ~pete |
/home/pete |
cd billr pete |
/home/pete |
cd illr arry |
/home/barry |
The first four are straightforward. The next two use a special
directory called .. (two dots, pronounced "dot dot"),
which means "parent of this directory."
Every directory has one of these; it's a
universal way to get to the directory above the current
one in the hierarchy -- which is called the parent directory.
Each directory also has the special directory .
(single dot), which just means "this directory."
Thus, cd . effectively does nothing.
Both . and .. are actually
special hidden files
in each directory that point to the directory itself and to its
parent directory, respectively. The root directory is its own parent.
The last two examples in the table use a new form of the cd
command, which is not included in most Bourne shells. The
form is cd old new. It takes the full pathname of
the current working directory and tries to find the string
old in it. If it finds the string,
it substitutes new
and changes to the resulting directory.
In the first of the
two examples, the shell substitutes pete for billr in the
current directory name and makes the result the new current
directory. The last example shows that the substitution need
not be a complete filename: substituting arry for illr
in /home/billr yields /home/barry.
(If the old string can't be found in the current directory
name, the shell prints an error message.)
Another feature of the Korn shell's cd command is
the form cd -, which changes to whatever directory you
were in before the current one. For example, if you start out
in /usr/lib, type cd
without an argument
to go to your home directory, and then type
cd -, you will
be back in /usr/lib.
1.6.1.4. Symbolic links to directories
Modern Unix systems provide symbolic links.
Symbolic links (sometimes called soft links)
provide a kind of "shortcut" to files in a different
part of the system's file hierarchy. You can make a symbolic link to either a file or
a directory, using either full or relative pathnames. When you access a file or
directory via a symbolic link, Unix "follows the link" to the real
file or directory.
Symbolic links to directories can generate surprising behavior.
To explain why, let's start by assuming that you're using the regular Bourne shell,
sh.[8]
Now,
suppose that we and user fred are working together on a project,
and the primary directory for the project is under his home directory,
say /home/fred/projects/important/wonderprog.
That's a fairly long pathname to have to type, even if using the tilde notation
(which we can't in the Bourne shell, but that's another story).
To make life easier, let's create a symbolic link to the wonderprog
directory in our home directory:
$ sh Use the Bourne shell
$ cd Make sure we're in our home directory
$ pwd Show where we are
/home/billr
Create the symbolic link
$ ln -s /home/fred/projects/important/wonderprog wonderprog
Now, when we type cd wonderprog, we end up in
/home/fred/projects/important/wonderprog:
$ cd wonderprog
$ pwd
/home/fred/projects/important/wonderprog
After working for a while adding important new features[9]
to wonderprog,
we remember that we need to update the .profile file in our home directory.
No problem: just cd back there and start work on the file, by
looking at it first with more.
$ cd .. Go back up one level
$ more .profile Look at .profile
.profile: No such file or directory
What happened?
The cd .. didn't take us back the way we came. Instead, it
went up one level in the physical filesystem hierarchy:
$ pwd
/home/fred/projects/important
This is the "gotcha" with symbolic links; the logical view of the filesystem
hierarchy presented by a symbolic link to a directory breaks down to the underlying
physical reality when you cd to the parent directory.
The Korn shell works differently. It understands symbolic links and, by default,
always presents you with a logical view of the filesystem.
Not only is cd built into the shell, but so is pwd.
Both commands accept the same two options: -L, to perform logical
operations (the default), and -P, to perform the operations
on the actual directories.
Let's start over in the Korn shell:
$ cd wonderprog ; pwd cd through the symbolic link
/home/billr/wonderprog Answer is logical location
$ pwd -P What is the physical location?
/home/fred/projects/important/wonderprog Answer is physical location
$ cd .. ; pwd Go back up one level
/home/billr Traversal was again logical
$ cd -P wonderprog; pwd Do a physical cd
/home/fred/projects/important/wonderprog Logical now equals physical
$ cd .. ; pwd Go back up up one level
/home/fred/projects/important Logical still equals physical
As shown,
the -P option to cd and pwd
lets you "get around" the Korn shell's default use of logical positioning.
Most of the time, though, logical positioning is exactly what you want.
NOTE:
The shell sets the PWD and OLDPWD
variables correspondingly whenever you do a cd;
the results of typing pwd and
print $PWD should always be the same.
As an unrelated note that rounds out the discussion, Unix systems also provide
"hard links" (or just plain links) to files. Each name for a file is called
a link; all hard links refer to the same data on disk, and if the file
is changed by one name, that change is seen when looking at it from a different name.
Hard links have certain restrictions, which symbolic links overcome.
(See ln(1) for more information.)
However, you cannot make hard links to directories, so symbolic links are all
that matter for cd and pwd.
1.6.2. Filenames and Wildcards
Sometimes you need to run a command on more than one file at a time.
The most common example of such a command is
ls, which lists information about files. In its simplest
form, without options or arguments, it lists the names of all
files in the working directory except special hidden files,
whose names begin with a dot (.).
If you give ls filename arguments,
it will list those files, which is sort of silly: if your
current directory has the files bob and fred in it,
and you type ls bob fred, the system will simply parrot
the filenames back at you.
Actually, ls is more often used with options that tell it
to list information about the files, like the -l
(long) option,
which tells ls to list the file's owner, group, size, time of
last modification, and other information, or -a (all), which
also lists the hidden files described above.
But sometimes you
want to verify the existence of a certain group of files without
having to know all of their names; for example, if you design web pages,
you might want to see which files
in your current directory have names that end in .html.
Filenames are so important in Unix that the shell provides a
built-in way to specify the pattern of a set of filenames
without having to know all of the names themselves.
You can
use special characters, called wildcards, in filenames
to turn them into patterns. We'll show the three basic
types of wildcards that all major Unix shells support, and we'll
save the Korn shell's set of advanced wildcard operators for
Chapter 4.
Table 1-2 lists the basic wildcards.
Table 1-2. Basic wildcards
Wildcard |
Matches |
? |
Any single character |
* |
Any string of characters |
[set] |
Any character in set |
[!set] |
Any character not in set |
The ? wildcard matches any single character, so that
if your directory contains the files program.c,
program.log, and
program.o, then the expression program.? matches
program.c and program.o but
not program.log.
The asterisk (*) is more powerful and far more
widely used; it matches
any string of characters. The expression
program.*
will match all three files in the previous paragraph;
web designers
can use the expression
*.html to match their input files.[10]
Table 1-3
should give you a better idea of
how the asterisk works. Assume that you have the files bob,
darlene, dave, ed,
frank, and fred
in your working directory.
Notice that * can stand for nothing: both
*ed and
*e* match
ed.
Also notice that the last example shows what the shell
does if it can't match anything: it just leaves the string
with the wildcard untouched.
Table 1-3. Using the * wildcard
Expression |
Yields |
fr* |
frank fred |
*ed |
ed fred |
b* |
bob |
*e* |
darlene dave ed fred |
*r* |
darlene frank fred |
* |
bob darlene dave ed frank fred |
d*e |
darlene dave |
g* |
g* |
Files are kept within directories in an unspecified order;
the shell sorts the results of each wildcard expansion.
(On some systems, the sorting may be subject to an ordering that is
appropriate to the system's location, but that is different from
the underlying machine collating order.
Unix traditionalists can use export LANG=C
to get the behavior they're used to.)
The remaining wildcard is the set construct.
A set is a list of characters (e.g., abc),
an inclusive range (e.g., a-z), or some combination of the two.
If you want the dash character to be part of a list, just
list it first or last.
Table 1-4
(which assumes an ASCII environment)
should explain things more clearly.
Table 1-4. Using the set construct wildcards
Expression |
Matches |
[abc] |
a, b, or c |
[.,;] |
Period, comma, or semicolon |
[-_] |
Dash and underscore |
[a-c] |
a, b, or c |
[a-z] |
All lowercase letters |
[!0-9] |
All non-digits |
[0-9!] |
All digits and exclamation point |
[a-zA-Z] |
All lower- and uppercase letters |
[a-zA-Z0-9_-] |
All letters, all digits, underscore, and dash |
In the original wildcard example, program.[co] and
program.[a-z] both match
program.c and program.o,
but not program.log.
An exclamation point after the left bracket lets you
"negate" a set.
For example, [!.;] matches any character
except period and semicolon; [!a-zA-Z] matches any
character that isn't a letter.
The range notation is handy, but you shouldn't make too many
assumptions about what characters are included in a range.
It's generally safe to use a range for uppercase letters, lowercase
letters, digits, or any subranges thereof
(e.g., [f-q], [2-6]).
Don't use ranges
on punctuation characters or mixed-case letters: e.g.,
[a-Z] and [A-z] should not be trusted to include all of the
letters and nothing more. The problem is that such ranges are
not entirely portable between different types of computers.[11]
Another problem is that modern systems support different locales,
which are ways of describing how the local character set works.
In most countries, the default locale's character set is different from that
of plain ASCII.
In Chapter 4,
we show you how to use POSIX bracket expressions to denote
letters, digits, punctuation, and other kinds of characters in
a portable fashion.
The process of matching expressions containing
wildcards to filenames is called wildcard expansion.
This is just one of several steps the shell takes when
reading and processing a command line; another that we have already
seen is tilde expansion, where tildes are replaced with
home directories where applicable. We'll see others in
later chapters, and the full details of the process are
enumerated in Chapter 7.
However, it's important to
be aware that the commands that you run see only the
results of wildcard expansion.
(Indeed, this is true of all expansions.)
That is, they just see a list of arguments, and they have
no knowledge of how those arguments came into being. For example, if you type
ls fr*
and your files
are as described earlier, then the shell expands the command
line to ls fred frank and invokes the
command ls
with arguments fred and frank.
If you type ls g*,
then (because there is no match) ls will be given
the literal string g* and will complain with the error message,
g* not found.[12]
(The actual message is likely to vary from system to system.)
Here is another example that should help you understand
why this is important.
Suppose you are a C programmer.
This just means that you deal with files whose names end
in .c (programs, a.k.a. source files), .h
(header files for programs), and .o
(object code files that aren't human-readable), as well as
other files.
Let's say you want to list all
source, object, and header files in your working directory. The command
ls *.[cho] does the trick.
The shell expands *.[cho] to
all files whose names end in a period followed
by a c, h, or o
and passes the resulting list to ls as
arguments.
In other words, ls will see the filenames
just as if they were all typed in individually -- but notice
that we assumed no knowledge of the actual filenames
whatsoever! We let the wildcards do the work.
As you gain experience with the shell, reflect on what
life would be like without wildcards. Pretty miserable,
we would say.
A final note about wildcards. You can set the variable
FIGNORE to a shell pattern describing
filenames to ignore during pattern matching.
(The full pattern capabilities of the shell are described later, in
Chapter 4.)
For example,
emacs saves backup versions of files by appending a
~ to the original filename.
Often, you don't need to see these files.
To ignore them, you
might add the following to your .profile file:
export FIGNORE='*~'
As with wildcard expansion, the test against FIGNORE
applies to all components of a pathname, not just the final one.