1.15. Parsing Comma-Separated DataProblemYou have a data file containing comma-separated values that you need to read in, but these data fields may have quoted commas or escaped quotes in them. Most spreadsheets and database programs use comma-separated values as a common interchange format. SolutionUse the procedure in Mastering Regular Expressions .
sub parse_csv {
my $text = shift; # record containing comma-separated values
my @new = ();
push(@new, $+) while $text =~ m{
# the first part groups the phrase inside the quotes.
# see explanation of this pattern in MRE
"([^\"\\]*(?:\\.[^\"\\]*)*)",?
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text, -1,1) eq ',';
return @new; # list of values that were comma-separated
}
Or use the standard Text::ParseWords module. use Text::ParseWords; sub parse_csv { return quoteword(",",0, $_[0]); } Discussion
Comma-separated input is a deceptive and complex format. It sounds simple, but involves a fairly complex escaping system because the fields themselves can contain commas. This makes the pattern matching solution complex and rules out a simple
Fortunately, Text::ParseWords hides the complexity from you. Pass its
If you want to represent quotation marks inside a field delimited by quotation marks, escape them with backslashes "
Here's how you'd use the
$line = q<XYZZY,"","O'Reilly, Inc","Wall, Larry","a \"glug\" bit,",5,
"Error, Core Dumped">;
@fields = parse_csv($line);
for ($i = 0; $i < @fields; $i++) {
print "$i : $fields[$i]\n";
}
See AlsoThe explanation of regular expression syntax in perlre (1) and Chapter 2 of Programming Perl ; the documentation for the standard Text::ParseWords module (also in Chapter 7 of Programming Perl ); the section "An Introductory Example: Parsing CSV Text" in Chapter 7 of Mastering Regular Expressions ![]() Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|