1.15. Parsing Comma-Separated DataProblemYou have a data file containing comma-separated values that you need to read in, but these data fields may have quoted commas or escaped quotes in them. Most spreadsheets and database programs use comma-separated values as a common interchange format. SolutionUse the procedure in Mastering Regular Expressions . sub parse_csv { my $text = shift; # record containing comma-separated values my @new = (); push(@new, $+) while $text =~ m{ # the first part groups the phrase inside the quotes. # see explanation of this pattern in MRE "([^\"\\]*(?:\\.[^\"\\]*)*)",? | ([^,]+),? | , }gx; push(@new, undef) if substr($text, -1,1) eq ','; return @new; # list of values that were comma-separated } Or use the standard Text::ParseWords module. use Text::ParseWords; sub parse_csv { return quoteword(",",0, $_[0]); } Discussion
Comma-separated input is a deceptive and complex format. It sounds simple, but involves a fairly complex escaping system because the fields themselves can contain commas. This makes the pattern matching solution complex and rules out a simple
Fortunately, Text::ParseWords hides the complexity from you. Pass its
If you want to represent quotation marks inside a field delimited by quotation marks, escape them with backslashes "
Here's how you'd use the $line = q<XYZZY,"","O'Reilly, Inc","Wall, Larry","a \"glug\" bit,",5, "Error, Core Dumped">; @fields = parse_csv($line); for ($i = 0; $i < @fields; $i++) { print "$i : $fields[$i]\n"; } See AlsoThe explanation of regular expression syntax in perlre (1) and Chapter 2 of Programming Perl ; the documentation for the standard Text::ParseWords module (also in Chapter 7 of Programming Perl ); the section "An Introductory Example: Parsing CSV Text" in Chapter 7 of Mastering Regular Expressions Copyright © 2001 O'Reilly & Associates. All rights reserved. |
|