Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 11 - C API Reference Guide, Part II / Launching Subprocesses
Introduction

The last topic we discuss is the API for launching subprocesses. While we don't like to encourage the creation of subprocesses because of the load they impose on a server, there are certain modules that need to do so. In fact, for certain modules, such as mod_cgi, launching subprocesses is their entire raison d'être.

Because Apache is a complex beast, calling fork() to spawn a new process within a server process is not something to be done lightly. There are a variety of issues to contend with, including, but not limited to, signal handlers, alarms, pending I/O, and listening sockets. For this reason, you should use Apache's published API to implement fork and exec, rather than trying to roll your own with the standard C functions.

In addition to discussing the subprocess API, this section covers a number of function calls that help in launching CGI scripts and setting up the environment for subprocesses.

void ap_add_cgi_vars (request_rec *r)
void ap_add_common_vars (request_rec *r)

(Declared in the header file util_script.h.) By convention, modules that need to launch subprocesses copy the contents of the current request record's subprocess_env table into the child process's environment first. This table starts out empty, but modules are free to add to it. For example, mod_env responds to the PassEnv, SetEnv, and UnsetEnv directives by setting or unsetting variables in an internal table. Then, during the request fixup phase, it copies these values into subprocess_env so that the variables are exposed to the environment by any content handler that launches a subprocess.
These two routines are called by mod_cgi to fill up the subprocess_env table with the standard CGI environment variables in preparation for launching a CGI script. You may want to use one or both yourself in order to initialize the environment to a standard state.
add_cgi_vars() sets up the environment variables that are specifically called for by the CGI/1.1 protocol. This includes GATEWAY_INTERFACE, QUERY_STRING, REQUEST_METHOD, PATH_INFO, and PATH_TRANSLATED, among others.
ap_add_common_vars() adds other common CGI environment variables to subprocess_ env. This includes various HTTP_ variables that hold incoming HTTP headers from the request such as HTTP_USER_AGENT and HTTP_REFERER, as well as such useful variables as PATH, SERVER_NAME, SERVER_PORT, SERVER_ROOT, and SCRIPT_ FILENAME.

char **ap_create_environment (pool *p, table *t)

(Declared in the header file util_script.h.) Among the arguments you need when execing a program with the ap_call_exec() command is an environment array. This function will take the key/value pairs contained in an Apache table and turn it into a suitable array. Usually you'll want to use the subprocess_env table for this purpose in order to be compatible with mod_cgi and mod_env.

char **env = ap_create_environment(r->pool, r->subprocess_env);

int ap_can_exec (const struct stat*)

(Declared in the header file httpd.h.) This utility routinely checks whether a file is executable by the current process user and/or group ID. You pass it the pointer to a stat structure, often the info field of the current request record. It returns a true value if the file is executable, false otherwise:

if(!ap_can_exec(&r->info)) {
   . . . log nasty error message . . . 
   return HTTP_FORBIDDEN;
}

int ap_bspawn_child (pool *p, int (*)(void *, child_info *), void *data,
enum kill_conditions, BUFF **pipe_in, BUFF **pipe_out, BUFF **pipe_err)

(Declared in the header file buff.h.) The ap_bspawn_child() function is a mixture of the Unix fork() and popen() calls. It can be used to open up a pipe to a child process or just to fork off a child process to execute in the background.
This function has many arguments. The first argument, p, is a pool pointer. The current request's resource pool is the usual choice. The second argument is a function pointer with the following prototype:

int child_routine (void *data, child_info *pinfo);

After forking, Apache will immediately call child_routine() with a generic data pointer (copied from the third argument to ap_bspawn_child(), which we discuss next) and a child_info pointer, a data type needed for the Win32 port. For all intents and purposes, the child_info argument is an opaque pointer that you pass to ap_call_exec(). It has no other use at present. The child routine should return a nonzero value on success or a zero value on failure.
The third argument to ap_bspawn_child() is data, a generic void pointer. Whatever you use for this argument will be passed to the child routine, and it is a simple way to pass information from the parent process to the child process. Since the child process usually requires access to the current request, it is common to pass a copy of the request_rec in this field.
The fourth argument is kill_conditions, an enumerated data type that affects what Apache does with the spawned child when the server is terminating or restarting. The possibilities, which are defined in alloc.h, are kill_never, to never send a signal to the child; kill_ always, to send the child a SIGKILL signal; kill_after_timeout, to send the child a SIGTERM, wait 3 seconds, and then send a SIGKILL; justwait, to wait forever for the child to complete; and kill_only_once, to send a SIGTERM and wait for the child to complete. The usual value is kill_after_timeout, which is the same scheme that Apache uses for the listening servers it spawns.
The last three arguments are pipe_in, pipe_out, and pipe_err. If they are non-NULL, ap_bspawn_child() fills them in with BUFF pointers attached to the standard input, output, and error of the spawned child process. By writing to pipe_in, the parent process will be able to send data to the standard input of the spawned process. By reading from pipe_out and pipe_err, you can retrieve data that the child has written to its standard output and error. Pass NULL for any or all of these arguments if you are not interested in talking to the child.

int ap_spawn_child (pool *p, int (*)(void *, child_info *), void *data, enum kill_conditions, FILE **pipe_in, FILE **pipe_out, FILE **pipe_err)

(Declared in the header file alloc.h.) This function works exactly like ap_bspawn_child() but uses more familiar FILE streams rather than BUFF streams for the I/O connection between the parent and the child. This function is rarely a good choice, however, because it is not compatible with the Win32 port, whereas ap_bspawn_child() is.

void ap_error_log2stderr (server_rec *s)

Once inside a spawned child, this function will rehook the standard error file descriptor back to the server's error log. You may want to do this after calling ap_bspawn_child() and before calling ap_call_exec() so that any error messages produced by the subprocess show up in the server error log:

ap_error_log2stderr(r->server);

void ap_cleanup_for_exec (void)

(Declared in the header file alloc.h.) You should call this function just before invoking ap_call_exec(). Its main duty is to run all the cleanup handlers for all the main resource pools and all subpools.

int ap_call_exec (request_rec *r, child_info *pinfo, char *argv0, char **env, int shellcmd)

(Declared in the header file util_script.h.) After calling ap_bspawn_child() or ap_spawn_ child(), your program will most probably call ap_call_exec() to replace the current process with a new one. The name of the command to run is specified in the request record's filename field, and its command-line arguments, if any, are specified in args. If successful, the new command is run and the call never returns. If preceded by an ap_spawn_ child(), the new process's standard input, output, and error will be attached to the BUFF*s created by that call.
This function takes five arguments. The first, r, is the current request record. It is used to set up the argument list for the command. The second, pinfo, is the child_info pointer passed to the function specified by ap_bspawn_child().
argv0 is the command name that will appear as the first item in the launched command's argv[] array. Although this argument is usually the same as the path of the command to run, this is not a necessary condition. It is sometimes useful to lie to a command about its name, particularly when dealing with oddball programs that behave differently depending on how they're invoked.
The fourth argument, env, is a pointer to an environment array. This is typically the pointer returned by ap_create_environment(). The last argument, shellcmd, is a flag indicating whether Apache should pass any arguments to the command. If shellcmd is true, then Apache will not pass any arguments to the command (this is counterintuitive). If shellcmd is false, then Apache will use the value of r->args to set up the arguments passed to the command. The contents of r->args must be in the old-fashioned CGI argument form in which individual arguments are separated by the + symbol and other funny characters are escaped as %XX hex escape sequences. args may not contain the unescaped = or & symbols. If it does, Apache will interpret it as a new-style CGI query string and refuse to pass it to the command. We'll see a concrete example of setting up the arguments for an external command shortly.
There are a few other precautionary steps ap_call_exec() will take. If SUEXEC is enabled, the program will be run through the setuid wrapper. If any of the RLimitCPU, RLimitMEM, or RLimitNPROC directives are enabled, setrlimit will be called underneath to limit the given resource to the configured value.
Finally, for convenience, under OS/2 and Win32 systems ap_call_exec() will implement the "shebang" Unix shell-ism. That is, if the first line of the requested file contains the #! sequence, the remainder of the string is assumed to be the program interpreter which will execute the script.
On Unix platforms, successful calls to ap_call_exec() will not return because the current process has been terminated and replaced by the command. On failure, ap_call_exec() will return -1 and errno will be set.⁴ On Win32 platforms, successful calls to ap_call_ exec() will return the process ID of the launched process and not terminate the current code. The upcoming example shows how to deal with this.

void ap_child_terminate (request_rec *r)

If for some reason you need to terminate the current child (perhaps because an attempt to exec a new program has failed), this function causes the child server process to terminate cleanly after the current request. It does this by setting the child's MaxRequests configuration variable to 1 and clearing the keepalive flag so that the current connection is broken after the request is serviced.

ap_child_terminate(r);

int ap_scan_script_header_err_buff (request_rec *r, BUFF *fb, char *buffer)

This function is useful when launching CGI scripts. It will scan the BUFF* stream fb for HTTP headers. Typically the BUFF* is the pipe_out pointer returned from a previous call to ap_bspawn_child(). Provided that the launched script outputs a valid header format, the headers will be added to the request record's headers_out table.
The same special actions are taken on certain headers as were discussed in Chapter 9, Perl API Reference Guide, when we covered the Perl cgi_header_out() method (see "Server Response Methods" in "The Apache Request Object"). If the headers were properly formatted and parsed, the return value will be OK. Otherwise, HTTP_INTERNAL_ SERVER_ERROR or some other error code will be returned. In addition, the function will log errors to the error log.
The buffer argument should be an empty character array allocated to MAX_STRING_ LENGTH or longer. If an error occurs during processing, this buffer will be set to contain the portion of the incoming data that generated the error. This may be useful for logging.

char buffer[MAX_STRING_LEN];
if(ap_scan_script_header_err(r, fb, buffer) != OK) {
  ... log nasty error message ...

int ap_scan_script_header_err (request_rec *r, FILE *f, char *buffer)

This function does exactly the same as ap_scan_script_header_err_buff(), except that it reads from a FILE* stream rather than a BUFF* stream. You would use this with the pipe_out FILE* returned by ap_spawn_child().

int ap_scan_script_header_err_core (request_rec *r, char *buffer,
int (*getsfunc) (char *, int, void *), void *getsfunc_data)

The tongue-twisting ap_scan_script_header_err_core() function is the underlying routine which implements ap_scan_script_header_err() and ap_scan_script_header_err_buff(). The key component here is the function pointer, getsfunc(), which is called upon to return a line of data in the same way that the standard fgets() function does. For example, here's how ap_scan_script_header_err() works, using the standard fgets() function:

static int getsfunc_FILE(char *buf, int len, void *f)
{
  return fgets(buf, len, (FILE *) f) != NULL;
}

API_EXPORT(int) ap_scan_script_header_err(request_rec *r, FILE *f,
                                        char *buffer)
{
   return scan_script_header_err_core(r, buffer, getsfunc_FILE, f);
}

Your module could replace getsfunc_FILE() with an implementation to read from a string or other resource.

Show Contents Previous Page Next Page