Data gets garbled when writing to csv with fputcsv() / fgetcsv()

我的未来我决定 提交于 2020-01-17 07:02:37

问题


There seems to be an encoding issue or bug in PHP with fputcsv() and fgetcsv().

The following PHP code:

$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];

print "\nBEFORE:\n";
var_export($row_before);
print "\n";

$fh = fopen($file = 'php://temp', 'rb+');

fputcsv($fh, $row_before);

rewind($fh);

$row_after = fgetcsv($fh);

print "\nAFTER:\n";
var_export($row_after);
print "\n\n";

fclose($fh);

Gives me this output:

BEFORE:
array (
  0 => 'A',
  1 => '["a","\\\\","b"]',
  2 => 'B',
)

AFTER:
array (
  0 => 'A',
  1 => '["a","\\\\',
  2 => 'b""]"',
  3 => 'B',
)

So clearly, the data is damaged on the way. Originally there were just 3 cells in the row, afterwards there are 4 cells in the row. The middle cell is split thanks to the backslash that is also used as an escape character.

See also https://3v4l.org/nc1oE Or here, with explicit values for delimiter, enclosure, escape_char: https://3v4l.org/Svt7m

Is there any way I can sanitize / escape my data before writing to CSV, to guarantee that the data read from the file will be exactly the same?

Is CSV a fully reversible format?

EDIT: The goal would be a mechanism to properly write and read ANY data as csv, so that after one round trip the data is still the same.

EDIT: I realize that I do not really understand the $escape_char parameter. See also fgetcsv/fputcsv $escape parameter fundamentally broken Maybe an answer to this would also bring us closer to a solution.


回答1:


The culprit is that fputcsv() uses an escape character, which is a non-standard extension to CSV. (Well, as far as RFC 7111 can be regarded as standard.) Basically, this escape character would have to be disabled, but passing an empty string as $escape to fputcsv() doesn't work. Usually, passing a NUL character should give the desired results, however, see https://3v4l.org/MlluN.




回答2:


Using your code with specific delimiters but changing the following line will work...

$enclosure = "'";

I think it may be to do with thinking that the \ is escaping the following quote.




回答3:


As in php, \\ used to escape the backslash(link for PHP manual escape sequence),so for making it as string u need to use one more single quote(' ').

so your input array should be...

$row_before = ['A', json_encode(['a', "'\\'", 'b']), 'B'];



回答4:


This is not PHP bug. It seems that json_encode() use the same delimiter (,), enclosure (") and escape (\) which is the same as default delimiter, enclosure and escape for both fputcsv() and fgetcsv(). You may differentiate enclosure or escape, and delimiter if necessary.

As already answered, in this case it will work by specify enclosure with (') instead:

$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];

print "\nBEFORE:\n";
var_export($row_before);
print "\n";

$fh = fopen($file = 'php://temp', 'rb+');

fputcsv($fh, $row_before, ',', "'");

rewind($fh);

$row_after = fgetcsv($fh, 0, ',', "'");

print "\nAFTER:\n";
var_export($row_after);
print "\n\n";

fclose($fh);



回答5:


Contrary to what others are saying, I claim that this is a PHP bug. I am going to report it, and update this answer.

EDIT: Now reported here, https://bugs.php.net/bug.php?id=74713

Discussed in this answer:

  • Does changing the delimiter help? -> Not really.
  • Could fputcsv() be fixed? -> Yes.

Does changing the delimiter help?

It can be shown that this is reproducible with any combination of delimiter, enclosure and escape character.

https://3v4l.org/a29kR

$delimiter = 'X';
$enclosure = 'Y';
$escape_char = "Z";

$row_before = [
  'A',
  "[{$enclosure}a{$enclosure}{$delimiter}{$enclosure}{$escape_char}{$escape_char}{$enclosure}{$delimiter}{$enclosure}b{$enclosure}]",
  'B',
];

print "\nBEFORE:\n";
var_export($row_before);
print "\n";

$fh = fopen($file = 'php://temp', 'rb+');

fputcsv($fh,$row_before,$delimiter,$enclosure, $escape_char);

rewind($fh);

$row_plain = fread($fh, 1000);

print "\nPLAIN:\n";
var_export($row_plain);
print "\n";

rewind($fh);

$row_after = fgetcsv($fh, 500,$delimiter,$enclosure, $escape_char);

print "\nAFTER:\n";
var_export($row_after);
print "\n\n";

fclose($fh);

Output:

BEFORE:
array (
  0 => 'A',
  1 => '[YaYXYZZYXYbY]',
  2 => 'B',
)

PLAIN:
'AXY[YYaYYXYYZZYXYYbYY]YXB
'

AFTER:
array (
  0 => 'A',
  1 => '[YaYXYZZ',
  2 => 'bYY]Y',
  3 => 'B',
)

Could fputcsv() be fixed?

For this let's turn back to more common and readable delimiter, enclosure and escape character.

$delimiter = ',';
$enclosure = '"';
$escape_char = "@";

Here the result is:

BEFORE:
array (
  0 => 'A',
  1 => '["a","@@","b"]',
  2 => 'B',
)

PLAIN:
'A,"[""a"",""@@",""b""]",B
'

AFTER:
array (
  0 => 'A',
  1 => '["a","@@',
  2 => 'b""]"',
  3 => 'B',
)

We see that the '"@@"' part is exported as '""@@"', while it SHOULD have been exported as '""@@""'.

In fact, doing this manually with fwrite() instead of fputcsv() does fix the problem: https://3v4l.org/4U1CQ



来源:https://stackoverflow.com/questions/44427926/data-gets-garbled-when-writing-to-csv-with-fputcsv-fgetcsv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!