18.7. Counting Lines, Paragraphs, or Records in a File18.7.2. SolutionTo count lines, use fgets( ). Because it reads a line at a time, you can count the number of times it's called before reaching the end of a file: $lines = 0; if ($fh = fopen('orders.txt','r')) { while (! feof($fh)) { if (fgets($fh,1048576)) { $lines++; } } } print $lines; To count paragraphs, increment the counter only when you read a blank line: $paragraphs = 0; if ($fh = fopen('great-american-novel.txt','r')) { while (! feof($fh)) { $s = fgets($fh,1048576); if (("\n" == $s) || ("\r\n" == $s)) { $paragraphs++; } } } print $paragraphs; To count records, increment the counter only when the line read contains just the record separator and whitespace: $records = 0; $record_separator = '--end--'; if ($fh = fopen('great-american-novel.txt','r')) { while (! feof($fh)) { $s = rtrim(fgets($fh,1048576)); if ($s == $record_separator) { $records++; } } } print $records; 18.7.3. DiscussionIn the line counter, $lines is incremented only if fgets( ) returns a true value. As fgets( ) moves through the file, it returns each line it retrieves. When it reaches the last line, it returns false, so $lines doesn't get incorrectly incremented. Because EOF has been reached on the file, feof( ) returns true, and the while loop ends. This paragraph counter works fine on simple text but may produce unexpected results when presented with a long string of blank lines or a file without two consecutive linebreaks. These problems can be remedied with functions based on preg_split( ). If the file is small and can be read into memory, use the pc_split_paragraphs( ) function shown in Example 18-1. This function returns an array containing each paragraph in the file. Example 18-1. pc_split_paragraphs( )function pc_split_paragraphs($file,$rs="\r?\n") { $text = join('',file($file)); $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text,-1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY); return $matches; } The contents of the file are broken on two or more consecutive newlines and returned in the $matches array. The default record-separation regular expression, \r?\n, matches both Windows and Unix linebreaks. If the file is too big to read into memory at once, use the pc_split_paragraphs_largefile( ) function shown in Example 18-2, which reads the file in 4K chunks. Example 18-2. pc_split_paragraphs_largefile( )function pc_split_paragraphs_largefile($file,$rs="\r?\n") { global $php_errormsg; $unmatched_text = ''; $paragraphs = array(); $fh = fopen($file,'r') or die($php_errormsg); while(! feof($fh)) { $s = fread($fh,4096) or die($php_errormsg); $text_to_split = $unmatched_text . $s; $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text_to_split,-1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY); // if the last chunk doesn't end with two record separators, save it * to prepend to the next section that gets read $last_match = $matches[count($matches)-1]; if (! preg_match("/$rs$rs\$/",$last_match)) { $unmatched_text = $last_match; array_pop($matches); } else { $unmatched_text = ''; } $paragraphs = array_merge($paragraphs,$matches); } // after reading all sections, if there is a final chunk that doesn't * end with the record separator, count it as a paragraph if ($unmatched_text) { $paragraphs[] = $unmatched_text; } return $paragraphs; } This function uses the same regular expression as pc_split_paragraphs( ) to split the file into paragraphs. When it finds a paragraph end in a chunk read from the file, it saves the rest of the text in the chunk in $unmatched_text and prepends it to the next chunk read. This includes the unmatched text as the beginning of the next paragraph in the file. 18.7.4. See AlsoDocumentation on fgets( ) at http://www.php.net/fgets, on feof( ) at http://www.php.net/feof, and on preg_split( ) at http://www.php.net/preg-split. Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|