I already managed to split the CSV file using this regex: \"/,(?=(?:[^\\\"]\\\"[^\\\"]\\\")(?![^\\\"]\\\"))/\"
But I ended up with an array of stri
preg_split('/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/', $line,-1,PREG_SPLIT_DELIM_CAPTURE);
Has Problems with " inside of strings like "Toys"R"Us"
So u should use instead:
preg_split('/'.$seperator.'(?=(?:[^\"])*(?![^\"]))/', $line,-1, PREG_SPLIT_DELIM_CAPTURE);
For those of you who wan't to use regex instead of fgetcsv. Here is a complete example how to create a html table from csv using a regex.
$data = file_get_contents('test.csv');
$pieces = explode("\n", $data);
$html .= "<table border='1'>\n";
foreach (array_filter($pieces) as $line) {
$html .= "<tr>\n";
$keywords = preg_split('/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/', $line,-1,PREG_SPLIT_DELIM_CAPTURE);
foreach ($keywords as $col) {
$html .= "<td>".trim($col, '"')."</td>\n";
}
$html .= "</tr>\n";
}
$html .= "</table>\n";
I agree with the others who said you should use the fgetcsv function instead of regexes. A regex may work okay on well-formed CSV data, but if the CSV is malformed or corrupt, the regex will silently fail, probably returning bogus results in the process.
However, the question was specifically about stripping unwanted quotation marks after the initial split. The one proposed solution (so far) is too naive, and it only deals the escaped quotes inside a field, not the actual delimiters. (I know the OP didn't ask about those, but they do need to be removed, so why not do them at the same as the others?) Here's my solution:
$csv_field = preg_replace('/"(.|$)/', '\1', $csv_field);
This regex matches a quotation mark followed by any character or by the end of the string, and replaces the matched character(s) with the second character, or with the empty string if it was the $
that matched. According to the spec, CSV fields can contain line separators; that doesn't seem to happen much, but you can add the 's' modifier to the regex if you need to.
Here's my quick attempt at it, although it will only work on word boundaries.
preg_replace('/([\W]){2}\b/', '\1', $csv)
Why do you bother splitting the file with regex when there's fgetcsv function that does all the hard work for you?
You can pass in the separator and delimiter and it will detect what to do.
There is function for reading csv files: fgetcsv