home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeRunning LinuxSearch this book

14.2. Programming Tools

Along with languages and compilers, there is a plethora of programming tools out there, including libraries, interface builders, debuggers, and other utilities to aid the programming process. In this section, we'll talk about some of the most interesting bells and whistles of these tools to let you know what's out there.

14.2.1. Debuggers

There are several interactive debuggers available for Linux. The de facto standard debugger is gdb, which we just covered in detail.

In addition to gdb, there are several other debuggers, each with features very similar to gdb. xxgdb is a version of gdb with an X Window System interface similar to that found on the xdbx debugger on other Unix systems. There are several panes in the xxgdb debugger's window. One pane looks like the regular gdb text interface, allowing you to input commands manually to interact with the system. Another pane automatically displays the current source file along with a marker displaying the current line. You can use the source pane to set and select breakpoints, browse the source, and so on, while typing commands directly to gdb. A number of buttons are provided on the xxgdb window as well, providing quick access to frequently used commands, such as step, next, and so on. Given the buttons, you can use the mouse in conjunction with the keyboard to debug your program within an easy-to-use X interface.

Another debugger similar to xxgdb is UPS, an X-based debugger that has been ported to a number of Unix platforms. UPS is much simpler than xxgdb and doesn't provide the same features, but it is a good debugger nonetheless and has a less demanding learning curve than gdb. It is adequate for most applications and straightforward debugging needs.

Two other graphical frontends for gdb deserve mention. DDD, the Data Display Debugger, has the same features as xxgdb with a nicer, Motif user interface. In addition, it can display structures and classes in a graphical manner, which is especially useful if you want to explore the data structures of an unknown program. kdbg comes from the KDE project and--in addition to the features that xxgdb provides--is fully integrated into the KDE desktop.

14.2.2. Profiling and Performance Tools

There are several utilities out there that allow you to monitor and rate the performance of your program. These tools help you locate bottlenecks in your code--places where performance is lacking. These tools also give you a rundown on the call structure of your program, indicating what functions are called, from where, and how often. (Everything you ever wanted to know about your program, but were afraid to ask.)

gprof is a profiling utility that gives you a detailed listing of the running statistics for your program, including how often each function was called, from where, the total amount of time that each function required, and so forth.

In order to use gprof with a program, you must compile the program using the -pg option with gcc. This adds profiling information to the object file and links the executable with standard libraries that have profiling information enabled.

Having compiled the program to profile with -pg, simply run it. If it exits normally, the file gmon.out will be written to the working directory of the program. This file contains profiling information for that run and can be used with gprof to display a table of statistics.

As an example, let's take a program called getstat, which gathers statistics about an image file. First, we compile getstat with -pg, and run it:

papaya$ getstat image11.pgm > stats.dat 
papaya$ ls -l gmon.out 
-rw-------   1 mdw      mdw         54448 Feb  5 17:00 gmon.out 
papaya$
Indeed, the profiling information was written to gmon.out.

To examine the profiling data, we run gprof and give it the name of the executable and the profiling file gmon.out :

papaya$ gprof getstat gmon.out
If you do not specify the name of the profiling file, gprof assumes the name gmon.out. It also assumes the executable name a.out if you do not specify that, either.

gprof output is rather verbose, so you may want to redirect it to a file or pipe it through a pager. It comes in two parts. The first part is the "flat profile," which gives a one-line entry for each function, listing the percentage of time spent in that function, the time (in seconds) used to execute that function, the number of calls to the function, and other information. For example:

Each sample counts as 0.01 seconds. 
  %   cumulative   self              self     total            
 time   seconds   seconds    calls  ms/call  ms/call  name     
 45.11     27.49    27.49       41   670.51   903.13  GetComponent 
 16.25     37.40     9.91                             mcount 
 10.72     43.93     6.54  1811863     0.00     0.00  Push 
 10.33     50.23     6.30  1811863     0.00     0.00  Pop 
  5.87     53.81     3.58       40    89.50   247.06  stackstats 
  4.92     56.81     3.00  1811863     0.00     0.00  TrimNeighbors
If any of the fields are blank in the output, gprof was unable to determine any further information about that function. This is usually caused by parts of the code that were not compiled with the -pg option; for example, if you call routines in nonstandard libraries that haven't been compiled with -pg, gprof won't be able to gather much information about those routines. In the previous output, the function mcount probably hasn't been compiled with profiling enabled.

As we can see, 45.11% of the total running time was spent in the function GetComponent--which amounts to 27.49 seconds. But is this because GetComponent is horribly inefficient or because GetComponent itself called many other slow functions? The functions Push and Pop were called many times during execution: could they be the culprits?[52]

[52]Always a possibility where this author's code is concerned!

The second part of the gprof report can help us here. It gives a detailed "call graph" describing which functions called other functions and how many times they were called. For example:

index % time    self  children    called     name 
                                                 <spontaneous> 
[1]     92.7    0.00   47.30                 start [1] 
                0.01   47.29       1/1           main [2] 
                0.00    0.00       1/2           on_exit [53] 
                0.00    0.00       1/1           exit [172]
The first column of the call graph is the index: a unique number given to every function, allowing you to find other functions in the graph. Here, the first function, start, is called implicitly when the program begins. start required 92.7% of the total running time (47.30 seconds), including its children, but required very little time to run itself. This is because start is the parent of all other functions in the program, including main; it makes sense that start plus its children requires that percentage of time.

The call graph normally displays the children as well as the parents of each function in the graph. Here, we can see that start called the functions main, on_exit, and exit (listed below the line for start). However, there are no parents (normally listed above start); instead, we see the ominous word <spontaneous>. This means that gprof was unable to determine the parent function of start; more than likely because start was not called from within the program itself but kicked off by the operating system.

Skipping down to the entry for GetComponent, or function-under-suspect, we see the following:

index % time    self  children    called     name 
                0.67    0.23       1/41          GetFirstComponent [12] 
               26.82    9.30      40/41          GetNextComponent [5] 
[4]     72.6   27.49    9.54      41         GetComponent [4] 
                6.54    0.00 1811863/1811863     Push [7] 
                3.00    0.00 1811863/1811863     TrimNeighbors [9] 
                0.00    0.00       1/1           InitStack [54]
The parent functions of GetComponent were GetFirstComponent and GetNextComponent, and its children were Push, TrimNeighbors, and InitStack. As we can see, GetComponent was called 41 times--one time from GetFirstComponent and 40 times from GetNextComponent. The gprof output contains notes that describe the report in more detail.

GetComponent itself requires over 27.49 seconds to run; only 9.54 seconds are spent executing the children of GetComponent (including the many calls to Push and TrimNeighbors!). So it looks as though GetComponent and possibly its parent GetNextComponent need some tuning; the oft-called Push function is not the sole cause of the problem.

gprof also keeps track of recursive calls and "cycles" of called functions and indicates the amount of time required for each call. Of course, using gprof effectively requires that all code to be profiled is compiled with the -pg option. It also requires a knowledge of the program you're attempting to profile; gprof can only tell you so much about what's going on. It's up to the programmer to optimize inefficient code.

  One last note about gprof: running it on a program that calls only a few functions--and runs very quickly--may not give you meaningful results. The units used for timing execution are usually rather coarse--maybe one-hundredth of a second--and if many functions in your program run more quickly than that, gprof will be unable to distinguish between their respective running times (rounding them to the nearest hundredth of a second). In order to get good profiling information, you may need to run your program under unusual circumstances--for example, giving it an unusually large data set to churn on, as in the previous example.

If gprof is more than you need, calls is a program that displays a tree of all function calls in your C source code. This can be useful to generate either an index of all called functions or to produce a high-level hierarchical report of the structure of a program.

Use of calls is simple: you tell it the names of the source files to map out, and a function-call tree is displayed. For example:

papaya$ calls scan.c 
    1   level1 [scan.c] 
    2           getid [scan.c] 
    3                   getc 
    4                   eatwhite [scan.c] 
    5                           getc 
    6                           ungetc 
    7                   strcmp 
    8           eatwhite [see line 4] 
    9           balance [scan.c] 
   10                   eatwhite [see line 4]
By default, calls lists only one instance of each called function at each level of the tree (so that if printf is called five times in a given function, it is listed only once). The -a switch prints all instances. calls has several other options as well; using calls -h gives you a summary.

14.2.3. Using strace

strace is a tool that displays the system calls being executed by a running program.[53] This can be extremely useful for real-time monitoring of a program's activity, although it does take some knowledge of programming at the system-call level. For example, when the library routine printf is used within a program, strace displays information only about the underlying write system call when it is executed. Also, strace can be quite verbose: many system calls are executed within a program that the programmer may not be aware of. However, strace is a good way to quickly determine the cause for a program crash or other strange failure.

[53]Debian users may find the ltrace package useful as well. It's a library call tracer that tracks all library calls, not just calls to the kernel; users of other distributions can download the latest version of the source at ftp://ftp.debian.org/debian/dists/unstable/main/ source/utils /.

Take the "Hello, World!" program given earlier in the chapter. Running strace on the executable hello gives us:

papaya$ strace hello 
execve("./hello", ["hello"], [/* 49 vars */]) = 0
mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,\
 -1, 0) = 0x40007000
mprotect(0x40000000, 20881, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
mprotect(0x8048000, 4922, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
stat("/etc/ld.so.cache", {st_mode=S_IFREG|0644, st_size=18612,\
 ...}) = 0
open("/etc/ld.so.cache", O_RDONLY)      = 3
mmap(0, 18612, PROT_READ, MAP_SHARED, 3, 0) = 0x40008000
close(3)                                = 0
stat("/etc/ld.so.preload", 0xbffff52c)  = -1 ENOENT (No such\
 file or directory)
open("/usr/local/KDE/lib/libc.so.5", O_RDONLY) = -1 ENOENT (No\
 such file or directory)
open("/usr/local/qt/lib/libc.so.5", O_RDONLY) = -1 ENOENT (No\
 such file or directory)
open("/lib/libc.so.5", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3"..., 4096) = 4096
mmap(0, 770048, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = \
0x4000d000
mmap(0x4000d000, 538959, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_\
FIXED, 3, 0) = 0x4000d000
mmap(0x40091000, 21564, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_\
FIXED, 3, 0x83000) = 0x40091000
mmap(0x40097000, 204584, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_\
FIXED|MAP_ANONYMOUS, -1, 0) = 0x40097000
close(3)                                = 0
mprotect(0x4000d000, 538959, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
munmap(0x40008000, 18612)               = 0
mprotect(0x8048000, 4922, PROT_READ|PROT_EXEC) = 0
mprotect(0x4000d000, 538959, PROT_READ|PROT_EXEC) = 0
mprotect(0x40000000, 20881, PROT_READ|PROT_EXEC) = 0
personality(PER_LINUX)                  = 0
geteuid()                               = 501
getuid()                                = 501
getgid()                                = 100
getegid()                               = 100
fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(3, 10), ...}) = 0
mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,\
 -1, 0) = 0x40008000
ioctl(1, TCGETS, {B9600 opost isig icanon echo ...}) = 0
write(1, "Hello World!\n", 13Hello World!
)          = 13
_exit(0)                                = ?
papaya$
This may be much more than you expected to see from a simple program. Let's walk through it, briefly, to explain what's going on.

The first call execve starts the program itself. All the mmap, mprotect, and munmap calls come from the kernel's memory management and are not really interesting here. In the three consecutive open calls, the loader is looking for the C library and finds it on the third try. The library header is then read and the library mapped into memory. After a few more memory-management operations and the calls to getuid, geteuid, getgid, and getegid, which retrieve the rights of the process, there is a call to ioctl. The ioctl is the result of a tcgetattr library call, which the program uses to retrieve the terminal attributes before attempting to write to the terminal. Finally, the write call prints our friendly message to the terminal and exit ends the program.

The calls to munmap (which unmaps a memory-mapped portion of a file) and brk (which allocates memory on the heap) set up the memory image of the running process. The ioctl call is the result of a tcgetattr library call, which retrieves the terminal attributes before attempting to write to it. Finally, the write call prints our friendly message to the terminal, and exit ends the program.

strace sends its output to standard error, so you can redirect it to a file separately from the actual output of the program (usually on standard output). As you can see, strace tells you not only the names of the system calls, but also their parameters (expressed as well-known constant names, if possible, instead of just numerics) and return values.

14.2.4. make and imake

We have already introduced make, the project manager used to compile projects, among other things. One problem with make is that makefiles aren't always easy to write. When large projects are involved, writing a makefile with cases for each kind of source file can be tedious. Even with the built-in make defaults, this is often more work than should be necessary.

One solution is to use imake, an extension to make based on the use of the C preprocessor. imake is simply a makefile generator: you write an Imakefile that imake converts to a robust makefile. imake is used by programs in the X Window System distribution but is not limited to use by X applications.

We should note at this point that imake can simplify the process of writing makefiles, especially for compiling C programs. However, make is more generally applicable than imake for this task. For example, you can use make to automatically format documents using groff or TeX. In this case, you need the flexibility of make alone, and imake may not be the best solution.

Here is a sample Imakefile that builds two programs, laplacian and getstat. At the top of the Imakefile, options for the entire compilation are specified (imake has its own defaults for these, but they aren't always useful). Following that, variables are defined for each program to be compiled, and the imake macros AllTarget and NormalProgramTarget create makefile rules for compiling these programs:

# Linker options: 
LDOPTIONS = -L/usr/local/lib -L../lib  
# The C compiler to use: 
CC = gcc 
# Flags to be used with gcc: 
CFLAGS = -I. -I$(HOME)/include -g  
# Local and system libraries to link against: 
LOCAL_LIBRARIES = -lvistuff 
SYS_LIBRARIES = -lm 
 
# Specify the sources in the SRCS variable, and the corresponding object 
# files in the variable LAP_OBJS. 
SRCS = laplacian.c laplacian-main.c 
LAP_OBJS = laplacian.o laplacian-main.o 

# Create rules for building laplacian. 
AllTarget(laplacian) 
NormalProgramTarget(laplacian,$(LAP_OBJS),,$(LOCAL_LIBRARIES),\
$(SYS_LIBRARIES)) 
 
# Do the same thing for getstat. Note that SRCS can be redefined for each 
# target, but LAP_OBJS can't, so we use a unique name for each target. 
SRCS = getstat.c getstat-main.c component.c 
GS_OBJS = getstat.o getstat-main.o component.o 

AllTarget(getstat) 
NormalProgramTarget(getstat,$(GS_OBJS),,$(LOCAL_LIBRARIES),\
$(SYS_LIBRARIES))

Note that we must use a different variable for the object files for each target, although SRCS can be redefined for each.

In order to translate the Imakefile into a makefile, use the command xmkmf. xmkmf will simply run imake with the options to do the translation correctly, using the default imake macros (such as AllTarget and NormalProgramTarget). You can then issue make to compile the program:

papaya$ xmkmf 
mv Makefile Makefile.bak 
imake -DUseInstalled -I/usr/X386/lib/X11/config 
papaya$
If you want to use your own imake macros, you can invoke imake by hand using the appropriate options. The imake and xmkmf manual pages should fill in the gaps. Software Portability with imake by Paul DuBois is another guide to the system.

If you find imake too complex for your taste, other "makefile makers" are available as well, such as ICmake, which generates makefiles using a macro language similar to C.

If you have compiled software packages yourself, you will often have found compilation instructions that told you to run a provided script called configure. This is produced by a Makefile generator called autoconf, which is often used together with another program called automake. autoconf and automake are not easy to use, but they give you way more flexibility than imake, ICmake, and all other Makefile generators. Unfortunately, the usage of autoconf is way beyond the scope of this book. If you're interested in this, get yourself a copy from the archives and start reading the documentation.

14.2.5. Using Checker

Checker is a replacement for the various memory-allocation routines, such as malloc, realloc, and free, used by C programs. It provides smarter memory-allocation procedures and code to detect illegal memory accesses and common faults, such as attempting to free a block of memory more than once. Checker displays detailed error messages if your program attempts any kind of hazardous memory access, helping you to catch segmentation faults in your program before they happen. It can also detect memory leaks--for example, places in the code where new memory is malloc'd without being free'd after use.

Checker is not just a replacement for malloc and friends. It also inserts code into your program to verify all memory reads and writes. It is very robust and therefore somewhat slower than the regular malloc routines. Checker is meant to be used during program development and testing; once all potential memory-corrupting bugs have been fixed, you can link your program with the standard libraries.

For example, take the following program, which allocates some memory and attempts to do various nasty things with it:

#include <malloc.h> 
int main() { 
  char *thememory, ch; 
 
  thememory=(char *)malloc(10*sizeof(char)); 
   
  ch=thememory[1];     /* Attempt to read uninitialized memory */ 
  thememory[12]=' ';   /* Attempt to write after the block */ 
  ch=thememory[-2];    /* Attempt to read before the block */ 
}
We simply compile this program with the -lchecker option, which links it with the Checker libraries. Upon running it, we get the following error messages (among others):
From Checker: 
        Memory access error 
        When Reading at address 0x10033 
        inside the heap 
        1 bytes after the begin of the block 

From Checker: 
        Memory access error 
        When Writing at address 0x1003e 
        inside the heap 
        2 bytes after the end of the block 
From Checker: 
        Memory access error 
        When Reading at address 0x10030 
        inside the heap 
        2 bytes before the begin of the block
For each memory violation, Checker reports an error and gives us information on what happened. The actual Checker error messages include information on where the program is executing as well as where the memory block was allocated. You can coax even more information out of Checker if you wish, and, along with a debugger such as gdb, you can pinpoint problems easily.[54]

[54]We have edited the output somewhat in order to remove extraneous information and to increase readability for the purpose of the example.

Checker also provides a garbage collector and detector you can call from within your program. In brief, the garbage detector informs you of any memory leaks: places where a function malloc'd a block of memory but forgot to free it before returning. The garbage collector routine walks through the heap and cleans up the results of these leaks. You can also call the garbage collector and detector manually when running the program from within gdb (as gdb allows you to directly call functions during execution).

14.2.6. Interface Building Tools

A number of applications and libraries let you easily generate a user interface for your applications under the X Window System. If you do not want to bother with the complexity of the X programming interface, using one of these simple interface-building tools may be the answer for you. There are also tools for producing a text-based interface for programs that don't require X.

The classic X programming model has attempted to be as general as possible, providing only the bare minimum of interface restrictions and assumptions. This generality allows programmers to build their own interface from scratch, as the core X libraries don't make any assumptions about the interface in advance. The X Toolkit Intrinsics (Xt) provides a rudimentary set of interface widgets (such as simple buttons, scrollbars, and the like), as well as a general interface for writing your own widgets if necessary. Unfortunately this can require a great deal of work for programmers who would rather use a set of pre-made interface routines. A number of Xt widget sets and programming libraries are available for Linux, all of which make the user interface easier to program.

In addition, the commercial Motif library and widget set is available from several vendors for an inexpensive single-user license fee. Also available is the XView library and widget interface, which is another alternative to using Xt for building interfaces under X. XView and Motif are two sets of X-based programming libraries that in some ways are easier to program than the X Toolkit Intrinsics. Many applications are available that utilize Motif and XView, such as XVhelp (a system for generating interactive hypertext help for your program). Binaries statically linked with Motif may be distributed freely and used by people who don't own Motif itself.

Before you start developing with XView or Motif, a word of caution is in order. XView, which was once a commercial product of Sun Microsystems, has been dropped by the developers and is no longer maintained. Also, while some people like the look, the programs written with XView look very nonstandard. Motif on the other hand is still being actively developed (albeit rather slowly), but has some problems itself. First, programming with Motif can be frustrating, because it is difficult, error prone, and cumbersome since the Motif API was not designed according to modern GUI API design principles. Also, Motif programs tend to run very slowly.

But there are other widget sets and interface libraries for X like:

Xaw3D

A modified version of the standard Athena widget set which provides a 3D, Motif-like look and feel

Qt

A C++ GUI toolkit written by the Norwegian company Troll Tech

GTK

A C GUI toolkit that was originally written for the image manipulation program GIMP

Many people complain that the Athena widgets are too plain in appearance. Xaw3D is completely compatible with the standard Athena set and can even replace the Athena libraries on your system, giving all programs that use Athena widgets a modern look. Xaw3D also provides a few widgets not found in the Athena set, such as a layout widget with a TeX-like interface for specifying the position of child widgets.

Qt is an excellent package for GUI development in C++ that sports an ingenious mechanism for connecting user interaction with program code, a very fast drawing engine, and a comprehensive but easy-to-use API. Qt is considered by many as the successor to Motif as the de facto GUI programming standard, because it is the foundation of the desktop KDE (see "Section 11.3, "The K Desktop Environment"" in Chapter 11, "Customizing Your X Environment"), which has gotten a lot of interest.

Qt is a commercial product, but you can use it for free if you write free software for Unix (and hence Linux) systems with it. There is also a (commercial) Windows version of Qt, which makes it possible to develop for Linux and Windows at the same time and create an application for the other platform by simply recompiling. Imagine being able to develop on your favorite operating system Linux and still being able to target the larger Windows market! One of the authors, Kalle, uses Qt to write both free software (the KDE Desktop Environment just mentioned) and commercial software (often cross-platform products that are developed for Linux and Windows). Qt is being very actively developed; for more information, see Programming with Qt by Kalle Dalheimer.

For those who do not like to program in C++, GTK might be a good choice. GTK programs usually offer just as good response time as Qt programs, but the toolkit itself is not as complete. Documentation especially is lacking. For C-based projects, though, GTK is good alternative if you do not need to be able to recompile your code on Windows.

Many programmers are finding that building a user interface, even with a complete set of widgets and routines in C, requires much overhead and can be quite difficult. This is a question of flexibility versus ease of programming: the easier the interface is to build, the less control the programmer has over it. Many programmers are finding that prebuilt widgets are adequate enough for their needs, so the loss in flexibility is not a problem.

One of the problems with interface generation and X programming is that it is difficult to generalize the most widely used elements of a user interface into a simple programming model. For example, many programs use features such as buttons, dialog boxes, pull-down menus, and so forth, but almost every program uses these widgets in a different context. In simplifying the creation of a graphical interface, generators tend to make assumptions about what you'll want. For example, it is simple enough to specify that a button, when pressed, should execute a certain procedure within your program, but what if you want the button to execute some specialized behavior the programming interface does not allow for? For example, what if you wanted the button to have a different effect when pressed with mouse button 2 instead of mouse button 1? If the interface-building system does not allow for this degree of generality, it is not of much use to programmers who need a powerful, customized interface.

The Tcl/Tk programming interface described in the previous chapter is growing in popularity, partly because it is so simple to use and provides a good amount of flexibility. Because Tcl and Tk routines can be called from interpreted "scripts" as well as internally from a C program, it is not difficult to tie the interface features provided by this language and toolkit to functionality in the program. Using Tcl and Tk is on the whole less demanding than learning to program Xlib and Xt (along with the myriad of widget sets) directly. It should be noted, though, that the larger a project gets, the more likely it is that you will want to use a language like C++ that is more suited towards large-scale development. For several reasons, larger projects tend to become very unwieldy with Tcl: the use of an interpreted language slows the execution of the program, Tcl/Tk design is hard to scale up to large projects, and important reliability features like compile- and link-time type checking are missing. The scaling problem is improved by the use of namespaces (a way to keep names in different parts of the program from clashing) and an object-oriented extension called [incr Tcl].

TclMotif, a version of Tcl bound with the popular Motif widget set, is also available for Linux. The Motif widgets are widely acclaimed to be easy to program and pleasant to use. The advantage of TclMotif is that the binary is freely distributable although Motif itself is a commercial product. Therefore, you do not have to own Motif to use TclMotif. TclMotif will in effect let you write programs that use Motif widgets and routines through the Tcl interface. A statically linked binary is available on a number of Linux FTP sites and from other sources. If you want to recompile TclMotif itself, for some reason, you need to own Motif in order to do so.

Wafe is another version of Tcl/Tk that includes the Athena widgets and miscellaneous other tools that make the programming model easier to use. If you are accustomed to programming Xt with the Athena widgets, but you want to move to Tcl and Tk, Wafe is a good place to start.

Tcl and Tk allow you to generate an X-based interface complete with windows, buttons, menus, scrollbars, and the like, around your existing program. The interface may be accessed from a Tcl script (as described in the section "Section 13.5.2, "Writing Tk Applications"" in Chapter 13, "Programming Languages") or from within a C program.

Another interface-building tool much like Tcl and Tk is xtpanel. xtpanel is meant primarily to generate an X interface "wrapper" around an existing text-based program. xtpanel allows you to set up a window with various panes, text editing regions, buttons, scrollbars, and so on, and bind the actions of these widgets to features in the program. For example, one could use xtpanel to produce an X-based interface for the gdb debugger, similar to xxgdb. You could define a "step" button which, when pressed, sends the step command to the regular gdb interface. A text-editing pane could be defined to interact with gdb in the regular way. Of course, doing something more complex, like setting up a source-view pane, would be difficult using something as general as xtpanel.

If you like the Tk toolkit, but do not like the programming language Tcl, you will be delighted to hear that you can use Tk with other languages as well; it has become the GUI toolkit of choice for the scripting languages Python and Perl, too.

If you require a nice text-based interface for a program, there are several options. The GNU getline library is a set of routines that provides advanced command-line editing, prompting, command history, and other features used by many programs. As an example, both bash and gdb use the getline library to read user input. getline provides the Emacs and vi-like command-line editing features found in bash and similar programs. (The use of command-line editing within bash is described in the section "Section 4.7, "Typing Shortcuts"" in Chapter 4, "Basic Unix Commands and Concepts".)

Another option is to write a set of Emacs interface routines for your program. An example of this is the gdb Emacs interface, which sets up multiple windows, special key sequences, and so on, within Emacs. The interface is discussed in the earlier section "Section 14.1.6.3, "Using Emacs with gdb"." (No changes were required to gdb code in order to implement this: look at the Emacs library file gdb.el for hints on how this was accomplished.) Emacs allows you to start up a subprogram within a text buffer and provides many routines for parsing and processing text within that buffer. For example, within the Emacs gdb interface, the gdb source listing output is captured by Emacs and turned into a command that displays the current line of code in another window. Routines written in Emacs LISP process the gdb output and take certain actions based on it.

The advantage to using Emacs to interact with text-based programs is that Emacs is a powerful and customizable user interface within itself. The user can easily redefine keys and commands to fit her own needs; you don't need to provide these customization features yourself. As long as the text interface of the program is straightforward enough to interact with Emacs, customization is not difficult to accomplish. In addition, many users prefer to do virtually everything within Emacs--from reading electronic mail and news, to compiling and debugging programs. Giving your program an Emacs frontend allows it to be used more easily by people with this mindset. It also allows your program to interact with other programs running under Emacs--for example, text can easily be cut and pasted between different Emacs text buffers. You can even write entire programs using Emacs LISP, if you wish.

14.2.7. Revision Control Tools--RCS

Revision Control System (RCS) has been ported to Linux. This is a set of programs that allow you to maintain a "library" of files that records a history of revisions, allows source-file locking (in case several people are working on the same project), and automatically keeps track of source-file version numbers. RCS is generally used with program source-code files, but is general enough to be applicable to any type of file where multiple revisions must be maintained.

Why bother with revision control? Many large projects require some kind of revision control in order to keep track of many tiny complex changes to the system. For example, attempting to maintain a program with a thousand source files and a team of several dozen programmers would be nearly impossible without using something like RCS. With RCS, you can ensure that only one person may modify a given source file at any one time, and all changes are checked in along with a log message detailing the change.

RCS is based on the concept of an RCS file, a file which acts as a "library" where source files are "checked in" and "checked out." Let's say that you have a source file importrtf.c that you want to maintain with RCS. The RCS filename would be importrtf.c,v by default. The RCS file contains a history of revisions to the file, allowing you to extract any previous checked-in version of the file. Each revision is tagged with a log message that you provide.

When you check in a file with RCS, revisions are added to the RCS file, and the original file is deleted by default. In order to access the original file, you must check it out from the RCS file. When you're editing a file, you generally don't want someone else to be able to edit it at the same time. Therefore, RCS places a lock on the file when you check it out for editing. A locked file may only be modified by the user who checks it out (this is accomplished through file permissions). Once you're done making changes to the source, you check it back in, which allows anyone working on the project to check it back out again for further work. Checking out a file as unlocked does not subject it to these restrictions; generally, files are checked out as locked only when they are to be edited but are checked out as unlocked just for reading (for example, to use the source file in a program build).

RCS automatically keeps track of all previous revisions in the RCS file and assigns incremental version numbers to each new revision that you check in. You can also specify a version number of your own when checking in a file with RCS; this allows you to start a new "revision branch" so that multiple projects can stem from different revisions of the same file. This is a good way to share code between projects but also to assure that changes made to one branch won't be reflected in others.

Here's an example. Take the source file importrtf.c, which contains our friendly program:

#include <stdio.h> 
 
int main(void) { 
  printf("Hello, world!"); 
}
The first step is to check it into RCS with the ci command:
papaya$ ci importrtf.c 
importrtf.c,v  <--  importrtf.c 
enter description, terminated with single '.' or end of file: 
NOTE: This is NOT the log message! 
>> Hello world source code 
>> . 
initial revision: 1.1 
done 
papaya$
The RCS file importrtf.c,v is created, and importrtf.c is removed.

In order to work on the source file again, use the co command to check it out. For example:

papaya$ co -l importrtf.c 
importrtf.c,v  -->  importrtf.c 
revision 1.1 (locked) 
done 
papaya$
will check out importrtf.c (from importrtf.c,v) and lock it. Locking the file allows you to edit it, and to check it back in. If you only need to check the file out in order to read it (for example, to issue a make), you can leave the -l switch off of the co command to check it out unlocked. You can't check in a file unless it is locked first (or it has never been checked in before, as in the example).

Now, you can make some changes to the source and check it back in when done. In many cases, you'll want to always have the file checked out and use ci to merely record your most recent revisions in the RCS file and bump the version number. For this, you can use the -l switch with ci, as so:

papaya$ ci -l importrtf.c 
importrtf.c,v  <--  importrtf.c 
new revision: 1.2; previous revision: 1.1 
enter log message, terminated with single '.' or end of file: 
>> Changed printf call 
>> . 
done 
papaya$
This automatically checks out the file, locked, after checking it in. This is a useful way to keep track of revisions even if you're the only one working on a project.

If you use RCS often, you may not like all of those unsightly importrtf.c,v RCS files cluttering up your directory. If you create the subdirectory RCS within your project directory, ci and co will place the RCS files there, out of the way from the rest of the source.

In addition, RCS keeps track of all previous revisions of your file. For instance, if you make a change to your program that causes it to break in some way and want to revert to the previous version to "undo" your changes and retrace your steps, you can specify a particular version number to check out with co. For example:

papaya$ co -l1.1 importrtf.c 
importrtf.c,v  -->  importrtf.c 
revision 1.1 (locked) 
writable importrtf.c exists; remove it? [ny](n): y  
done 
papaya$
checks out version 1.1 of the file importrtf.c. You can use the program rlog to print the revision history of a particular file; this displays your revision log entries (entered with ci) along with other information such as the date, the user who checked in the revision, and so forth.

RCS automatically updates embedded "keyword strings" in your source file at checkout time. For example, if you have the string:

/* $Header$ */
in the source file, co will replace it with an informative line about the revision date, version number, and so forth, as in:
/* $Header: /work/linux/hitch/programming/tools/RCS/rcs.tex 1.2 1994/12/04 15:19:31 mdw Exp mdw $ */

Other keywords exist as well, such as $Author$ $Date$, and $Log$ (the latter keeps a complete record of the log entries for each revision embedded in the source file).

Many programmers place a static string within each source file to identify the version of the program after it has been compiled. For example, within each source file in your program, you can place the line:

static char rcsid[] = "\@(#)$Header$";

co replaces the keyword $Header$ with a string of the form given here. This static string survives in the executable, and the what command displays these strings in a given binary. For example, after compiling importrtf.c into the executable importrtf, we can use the command:

papaya$ what importrtf 
importrtf: 
        $Header: /work/linux/hitch/programming/tools/RCS/rcs.tex 
                      1.2 1994/12/04 15:19:31 mdw Exp mdw $ 
papaya$
what picks out strings beginning with the characters @(#) in a file and displays them. If you have a program that has been compiled from many source files and libraries, and you don't know how up to date each of the components are, you can use what to display a version string for each source file used to compile the binary.

RCS has several other programs in its suite, including rcs, used for maintaining RCS files. Among other things, rcs can give other users permission to check out sources from an RCS file. See the manual pages for ci, co, and rcs for more information.

14.2.8. Revision Control Tools--CVS

CVS, the Concurrent Version System, is more complex than RCS and thus perhaps a little bit oversized for one-man projects. But whenever more than one or two programmers are working on a project or the source code is distributed over several directories, CVS is the better choice. CVS uses the RCS file format for saving changes, but employs a management structure of its own.

By default, CVS works with full directory trees. That is, each CVS command you issue affects the current directory and all the subdirectories it contains, including their subdirectories and so on. This recursive traversal can be switched off with a command-line option, or you can specify a single file for the command to operate on.

CVS has formalized the sandbox concept that is used in many software development shops. In this concept, there is a so-called repository containing the "official" sources that are known to compile and work (at least partly). No developer is ever allowed to directly edit files in this repository. Instead, each developer checks out a local directory tree, the so-called sandbox. Here, he can edit the sources to his heart's delight, make changes, add or remove files, and do all sorts of things that developers usually do (no, not playing Quake or eating marshmallows). When the developer has made sure that her changes compile and work, she transmits them to the repository again and thus makes them available for the other developers.

When you as a developer have checked out a local directory tree, all the files are writable. You can make any necessary changes to the files in your personal workspace. When you have finished local testing and feel sure enough of your work to share the changes with the rest of the programming team, you write any changed files back into the central repository by issuing a CVS commit command. CVS then checks whether another developer has checked in changes since you checked out your directory tree. If this is the case, CVS does not let you check your changes in, but asks you first to take the changes of the other developers over to your local tree. During this update operation, CVS uses a sophisticated algorithm to reconcile ("merge") your changes with those of the other developers. There are cases in which this is not automatically possible. In this case, CVS informs you that there have been conflicts and asks you to resolve those. The file in question is marked up with special characters so that you can see where the conflict has occurred and decide which version should be used. Note that CVS makes sure that conflicts can only occur in local developers' trees. There is always a consistent version in the repository.

14.2.8.1. Setting up a CVS repository

If you are working in a larger project, it is likely that someone else has already set up all the necessary machinery to use CVS. But if you are your project's administrator or you just want to tinker around with CVS on your local machine, you will have to set up a repository yourself.

First, set your environment variable CVSROOT to a directory where you want your CVS repository to be. CVS can keep as many projects as you like in a repository and makes sure they do not interfere with each other. Thus, you just have to pick a directory once to store all projects maintained by CVS, and you won't need to need to change it when you switch projects. Instead of using the variable CVSROOT, you can always use the command-line switch -d with all CVS commands, but since this is cumbersome to type all the time, we will assume that you have set CVSROOT.

Once the directory exists for a repository, you can create the repository itself with the following command (assuming that CVS is installed on your machine):

$tigger cvs init

There are several different ways to create a project tree in the CVS repository. If you already have a directory tree, but it is not yet managed by RCS, you can simply import it into the repository by calling:

$tigger cvs import directory manufacturer tag
where directory is the name of the top-level directory of the project, manufacturer is the name of the author of the code (you can use whatever you like here) and tag is a so-called release tag that can be chosen at will. For example:

$tigger cvs import dataimport acmeinc initial
... lots of output ...

If you want to start a completely new project, you can simply create the directory tree with mkdir calls and then import this empty tree like shown in the previous example.

If you want to import a project that is already managed by RCS, things get a little bit more difficult, because you cannot use cvs import. In this case, you have to create the needed directories directly in the repository and then copy all RCS files (all files that end in ,v) into those directories. Do not use RCS subdirectories here!

Every repository contains a file named CVSROOT/modules that contains names of projects in the repository. It is a good idea to edit the modules file of the repository to add the new module. You can check out, edit, and check in this file like every other file. Thus, in order to add your module to the list, do the following (we will cover the various commands soon):

$tigger cvs checkout CVSROOT/modules
$tigger cd CVSROOT
$tigger emacs modules
... or any other editor of your choice, see below for what to enter ...
$tigger cvs commit modules
$tigger cd ..
$tigger cvs release -d CVSROOT

If you are not doing anything fancy, the format of the modules file is very easy: Each line starts with the name of module, followed by a space or tab and the path within the repository. There are many more things you can do with the modules file which you can find in the CVS documentation, especially in the Info pages or at http://www.loria.fr/~molli/cvs-index.html.

14.2.8.2. Working with CVS

In the following section, we will assume that either you or your system administrator has set up a module called dataimport. You can now check out a local tree of this module with the following command:

$tigger cvs checkout dataimport

If there is no module defined for the project you want to work on, you need to know the path within the repository. For example, something like the following could be needed:

$tigger cvs checkout clients/acmeinc/dataimport

Whichever version of the checkout command you use, CVS will create a directory called dataimport under your current working directory and check out all files and directories from the repository that belong to this module. All files are writable, and you can start editing them right away.

After you have made some changes, you can write back the changed files into the repository with one single command:

$tigger cvs commit

Of course, you can also check in single files:

$tigger cvs commit importrtf.c

But whatever you do, CVS will ask you--as RCS does--for a comment to include with your changes. But CVS goes a step beyond RCS in convenience. Instead of the rudimentary prompt from RCS, you get a full-screen editor to work in. You can choose this editor by setting the environment variable CVSEDITOR; if this is not set, CVS looks in EDITOR, and if this is not defined either, CVS invokes vi. If you check in a whole project, CVS will use the comment you entered for each directory in which there have been changes but will start a new editor every time to ask you whether you might want to change each file.

As already mentioned, it is not necessary to set CVSROOT correctly for checking files in, because when checking the tree out, CVS has created a directory CVS in each work directory. This directory contains all the information that CVS needs for its work, including where to find the repository.

While you have been working on your files, it may well be that a co-worker has checked in some of the files that you are currently working on. In this case, CVS will not let you check in your files but asks you to first update your local tree. Do this with the command:

$tigger cvs update
M importrtf.c
A exportrtf.c
? importrtf
U importword.c

(You can specify a single file here as well.) You should carefully examine the output of this command: CVS outputs the names of all the files it handles, each preceded by a single key letter. This letter tells you what has happened during the update operation. The most important letters are shown in Table 14-1.

Table 14-1. Key Letters for Files Under CVS

Letter

Explanation

P

The file has been updated. The P is shown if the file has been added to the repository in the meantime or if it has been changed, but you have not made any changes to this file yourself.

U

You have changed this file in the meantime, but nobody else has.

M

You have changed this file in the meantime, and somebody else has checked in a newer version. All the changes have been merged successfully.

C

You have changed this file in the meantime, and somebody else has checked in a newer version. During the merge attempt, conflicts have arisen.

?

CVS has no information about this file, that is, this file is not under CVS's control.

The C is the most important of those letters. CVS was not able to merge all changes and needs your help. Load those files into your editor and look for the string <<<<<<<. After this string, the name of the file is shown again, followed by your version, ending with a line containing =======. Then comes the version of the code from the repository, ending with a line containing >>>>>>>. You now have to find out--probably by communicating with your co-worker--which version is better or whether it is possible to merge the two versions by hand. Change the file accordingly and remove the CVS markings <<<<<<<, ======= and >>>>>>>. Save the file and once again commit it.

If you decide that you want to stop working on a project for a time, you should check whether you have really committed all changes. To do this, change to the directory above the root directory of your project and issue the command:

$tigger cvs release dataimport

CVS then checks whether you have written back all changes into the repository and warns you if necessary. A useful option is -d, which deletes the local tree if all changes have been committed.

14.2.8.3. CVS over the Internet

CVS is also very useful where distributed development teams[55] are concerned, because it provides several possibilities to access a repository on another machine.

[55]The use of CVS has burgeoned along with the number of free software projects which are developed over the Internet by people from different continents.

If you can log into the machine holding the repository with rsh, you can use remote CVS to access the repository. To check out a module, do the following:

cvs -d :ext:user@domain.com:/path/to/repository checkout dataimport

If you cannot or do not want to use rsh for security reasons, you can also use the secure shell ssh. You can tell CVS that you want to use ssh by setting the environment variable CVS_RSH to ssh.

Authentication and access to the repository can also be done via a client/server protocol. Remote access requires a CVS server running on the machine with the repository; see the CVS documentation for how to do this. If the server is set up, you can login to it with:

cvs -d :pserver:user@domain.com:path/to/repository
CVS password:

As shown, the CVS server will ask you for your CVS password, which has been assigned to you by the administrator of the CVS server. This login procedure is necessary only once for every repository. When you check out a module, you need to specify the machine with the server, your username on that machine, and the remote path to the repository; as with local repositories, this information is saved in your local tree. Since the password is saved with minimal encryption in the file .cvspass in your home directory, there is a potential security risk here. The CVS documentation tells you more about this.

When you use CVS over the Internet and check out or update largish modules, you might also want to use the -z option, which expects an additional integer parameter and transmits the data in compressed form.

14.2.9. Patching Files

Let's say that you're trying to maintain a program that is updated periodically, but the program contains many source files, and releasing a complete source distribution with every update is not feasible. The best way to incrementally update source files is with patch, a program by Larry Wall, author of Perl.

patch is a program that makes context-dependent changes in a file in order to update that file from one version to the next. This way, when your program changes, you simply release a patch file against the source, which the user applies with patch to get the newest version. For example, Linus Torvalds usually releases new Linux kernel versions in the form of patch files as well as complete source distributions.

A nice feature of patch is that it applies updates in context; that is, if you have made changes to the source yourself, but still wish to get the changes in the patch file update, patch usually can figure out the right location in the original file to apply the changes to. This way, your versions of the original source files don't need to correspond exactly to those that the patch file was made against.

In order to make a patch file, the program diff is used, which produces "context diffs" between two files. For example, take our overused "Hello World" source code, given here:

/* hello.c version 1.0 by Norbert Ebersol */ 
#include <stdio.h>  
 
int main() { 
  printf("Hello, World!"); 
  exit(0); 
}

Let's say that you were to update this source, as in the following:

/* hello.c version 2.0 */ 
/* (c)1994 Norbert Ebersol */ 
#include <stdio.h>  
 
int main() { 
  printf("Hello, Mother Earth!\n"); 
  return 0; 
}

If you want to produce a patch file to update the original hello.c to the newest version, use diff with the -c option:

papaya$ diff -c hello.c.old hello.c > hello.patch
This produces the patch file hello.patch that describes how to convert the original hello.c (here, saved in the file hello.c.old) to the new version. You can distribute this patch file to anyone who has the original version of "Hello, World," and they can use patch to update it.

Using patch is quite simple; in most cases, you simply run it with the patch file as input:[56]

papaya$ patch < hello.patch 
Hmm...  Looks like a new-style context diff to me... 
The text leading up to this was: 
-------------------------- 
|*** hello.c.old        Sun Feb  6 15:30:52 1994 
|--- hello.c    Sun Feb  6 15:32:21 1994 
-------------------------- 
Patching file hello.c using Plan A... 
Hunk #1 succeeded at 1. 
done 
papaya$
patch warns you if it appears as though the patch has already been applied. If we tried to apply the patch file again, patch would ask us if we wanted to assume that -R was enabled--which reverses the patch. This is a good way to back out patches you didn't intend to apply. patch also saves the original version of each file that it updates in a backup file, usually named filename~ (the filename with a tilde appended).

[56]The output shown here is from the last version that Larry Wall has released, Version 2.1. If you have a newer version of patch, you will need the - -verbose flag to get the same output.

In many cases, you'll want to update not only a single source file but an entire directory tree of sources. patch allows many files to be updated from a single diff. Let's say you have two directory trees, hello.old and hello, which contain the sources for the old and new versions of a program, respectively. To make a patch file for the entire tree, use the -r switch with diff:

papaya$ diff -cr hello.old hello > hello.patch

Now, let's move to the system where the software needs to be updated. Assuming that the original source is contained in the directory hello, you can apply the patch with:

papaya$ patch -p0 < hello.patch

The -p0 switch tells patch to preserve the pathnames of files to be updated (so it knows to look in the hello directory for the source). If you have the source to be patched saved in a directory named differently from that given in the patch file, you may need to use the -p option. See the patch manual page for details about this.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.