home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomePHP CookbookSearch this book

Chapter 18. Files

18.1. Introduction

The input and output in a web application usually flow between browser, server, and database, but there are many circumstances in which files are involved too. Files are useful for retrieving remote web pages for local processing, storing data without a database, and saving information that other programs need access to. Plus, as PHP becomes a tool for more than just pumping out web pages, the file I/O functions are even more useful.

PHP's interface for file I/O is similar to C's, although less complicated. The fundamental unit of identifying a file to read from or write to is a file handle. This handle identifies your connection to a specific file, and you use it for operations on the file. This chapter focuses on opening and closing files and manipulating file handles in PHP, as well as what you can do with the file contents once you've opened a file. Chapter 19 deals with directories and file metadata such as permissions.

Opening /tmp/cookie-data and writing the contents of a specific cookie to the file looks like this:

$fh = fopen('/tmp/cookie-data','w')      or die("can't open file");
if (-1 == fwrite($fh,$_COOKIE['flavor'])) { die("can't write data"); }
fclose($fh)                              or die("can't close file");

The function fopen( ) returns a file handle if its attempt to open the file is successful. If it can't open the file (because of incorrect permissions, for example), it returns false. Section 18.2 and Section 18.4 cover ways to open files.

The function fwrite( ) writes the value of the flavor cookie to the file handle. It returns the number of bytes written. If it can't write the string (not enough disk space, for example), it returns -1.

Last, fclose( ) closes the file handle. This is done automatically at the end of a request, but it's a good idea to explicitly close all files you open anyway. It prevents problems using the code in a command-line context and frees up system resources. It also allows you to check the return code from fclose( ). Buffered data might not be actually written to disk until fclose( ) is called, so it's here that "disk full" errors are sometimes reported.

As with other processes, PHP must have the correct permissions to read from and write to a file. This is usually straightforward in a command-line context but can cause confusion when running scripts within a web server. Your web server (and consequently your PHP scripts) probably runs as a specific user dedicated to web serving (or perhaps as user nobody). For good security reasons, this user often has restricted permissions on what files it can access. If your script is having trouble with a file operation, make sure the web server's user or group — not yours — has permission to perform that file operation. Some web serving setups may run your script as you, though, in which case you need to make sure that your scripts can't accidentally read or write personal files that aren't part of your web site.

Because most file-handling functions just return false on error, you have to do some additional work to find more details about that error. When the track_errors configuration directive is on, each error message is put in the global variable $php_errormsg. Including this variable as part of your error output makes debugging easier:

$fh = fopen('/tmp/cookie-data','w')      or die("can't open: $php_errormsg");
if (-1 == fwrite($fh,$_COOKIE['flavor'])) { die("can't write: $php_errormsg") };
fclose($fh)                              or die("can't close: $php_errormsg");

If you don't have permission to write to the /tmp/cookie-data, the example dies with this error output:

can't open: fopen("/tmp/cookie-data", "w") - Permission denied

There are differences in how files are treated by Windows and by Unix. To ensure your file access code works appropriately on Unix and Windows, take care to handle line-delimiter characters and pathnames correctly.

A line delimiter on Windows is two characters: ASCII 13 (carriage return) followed by ASCII 10 ( linefeed or newline). On Unix, it's just ASCII 10. The typewriter-era names for these characters explain why you can get "stair-stepped" text when printing out a Unix-delimited file. Imagine these character names as commands to the platen in a typewriter or character-at-a-time printer. A carriage return sends the platen back to the beginning of the line it's on, and a line feed advances the paper by one line. A misconfigured printer encountering a Unix-delimited file dutifully follows instructions and does a linefeed at the end of each line. This advances to the next line but doesn't move the horizontal printing position back to the left margin. The next stair-stepped line of text begins (horizontally) where the previous line left off.

PHP functions that use a newline as a line-ending delimiter (for example, fgets( )) work on both Windows and Unix because a newline is the character at the end of the line on either platform.

To remove any line-delimiter characters, use the PHP function rtrim( ) :

$fh = fopen('/tmp/lines-of-data.txt','r') or die($php_errormsg);
while($s = fgets($fh,1024)) {
    $s = rtrim($s);
    // do something with $s ... 
}
fclose($fh)                               or die($php_errormsg);

This function removes any trailing whitespace in the line, including ASCII 13 and ASCII 10 (as well as tab and space). If there's whitespace at the end of a line that you want to preserve, but you still want to remove carriage returns and line feeds, use an appropriate regular expression:

$fh = fopen('/tmp/lines-of-data.txt','r') or die($php_errormsg);
while($s = fgets($fh,1024)) {
    $s = preg_replace('/\r?\n$/','',$s);
    // do something with $s ... 
}
fclose($fh)                               or die($php_errormsg);

Unix and Windows also differ on the character used to separate directories in pathnames. Unix uses a slash (/), and Windows uses a backslash (\). PHP makes sorting this out easy, however, because the Windows version of PHP also understands / as a directory separator. For example, this code successfully prints the contents of C:\Alligator\Crocodile Menu.txt:

$fh = fopen('c:/alligator/crocodile menu.txt','r') or die($php_errormsg);
while($s = fgets($fh,1024)) {
    print $s;
}
fclose($fh)                                        or die($php_errormsg);

This piece of code also takes advantage of the fact that Windows filenames aren't case-sensitive. However, Unix filenames are.

Sorting out linebreak confusion isn't only a problem in your code that reads and writes files but in your source code as well. If you have multiple people working on a project, make sure all developers configure their editors to use the same kind of linebreaks.

Once you've opened a file, PHP gives you many tools to process its data. In keeping with PHP's C-like I/O interface, the two basic functions to read data from a file are fread( ) , which reads a specified number of bytes, and fgets( ), which reads a line at a time (up to a specified number of bytes.) This code handles lines up to 256 bytes long:

$fh = fopen('orders.txt','r') or die($php_errormsg);
while (! feof($fh)) {
    $s = fgets($fh,256);
    process_order($s);
}
fclose($fh) or die($php_errormsg);

If orders.txt has a 300-byte line, fgets( ) returns only the first 256 bytes. The next fgets( ) returns the next 44 bytes and stops when it finds the newline. The next fgets( ) moves to the next line of the file. Examples in this chapter generally give fgets( ) a second argument of 1048576: 1 MB. This is longer than lines in most text files, but the presence of such an outlandish number should serve as a reminder to consider your maximum expected line length when using fgets().

Many operations on file contents, such as picking a line at random (see Section 18.11) are conceptually simpler (and require less code) if the entire file is read into a string or array. Section 18.6 provides a method for reading a file into a string, and the file( ) function puts each line of a file into an array. The tradeoff for simplicity, however, is memory consumption. This can be especially harmful when you are using PHP as a server module. Generally, when a process (such as a web server process with PHP embedded in it) allocates memory (as PHP does to read an entire file into a string or array), it can't return that memory to the operating system until it dies. This means that calling file( ) on a 1 MB file from PHP running as an Apache module increases the size of that Apache process by 1 MB until the process dies. Repeated a few times, this decreases server efficiency. There are certainly good reasons for processing an entire file at once, but be conscious of the memory-use implications when you do.

Section 18.21 through Section 18.24 deal with running other programs from within a PHP program. Some program-execution operators or functions offer ways to run a program and read its output all at once (backticks) or read its last line of output (system( )). PHP can use pipes to run a program, pass it input, or read its output. Because a pipe is read with standard I/O functions (fgets( ) and fread( )), you decide how you want the input and you can do other tasks between reading chunks of input. Similarly, writing to a pipe is done with fputs( ) and fwrite( ), so you can pass input to a program in arbitrary increments.

Pipes have the same permission issues as regular files. The PHP process must have execute permission on the program being opened as a pipe. If you have trouble opening a pipe, especially if PHP is running as a special web server user, make sure the user is allowed to execute the program you are opening a pipe to.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.