fgetcsv() ignores special characters when they are at the beginning of line!

时光毁灭记忆、已成空白 提交于 2019-11-28 11:04:38

Have you already checked out the manual page on fgetcsv? There is nothing talking about that specific problem offhand, but a number of contributions maybe worth looking through if nothing comes up here.

There's this, for example:

Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

Also, seeing as it's always in the beginning of the line, could it be that this is really a hidden line break problem? There's this:

Note: If PHP is not properly recognizing the line endings when reading files either on or created by a Macintosh computer, enabling the auto_detect_line_endings run-time configuration option may help resolve the problem.

You may also want to try saving the file with different line endings.

Brock Batsell

Are you setting your locale correctly before calling fgetcsv()?

setlocale(LC_ALL, 'fr_FR.UTF-8');

Otherwise, fgetcsv() is not multi-byte safe.

Make sure that you set it to something that appears in your list of available locales. On linux (certainly on debian) you can see this by doing

locale -a

You should get something like...

C
en_US.utf8
POSIX

For UTF8 support pick an encoding with utf8 on the end. If your input is encoded with something else you'll need to use the appropriate locale - but make sure your OS supports it first.

If you set the locale to a locale which isn't available on your system it won't help you.

This behaviour has a bug report filed for it, but apparently it isn't a bug.

We saw the same result with LANG set to C, and worked around it by ensuring that such values were wrapped in quotation marks. For example, the line

a,"a",é,"é",óú,"óú",ó&ú,"ó&ú"

generates the following array when passed through fgetcsv():

array (
  0 => 'a',
  1 => 'a',
  2 => '',
  3 => 'é',
  4 => '',
  5 => 'óú',
  6 => '&ú',
  7 => 'ó&ú',
)

Of course, you'll have to escape any quotation marks in the value by doubling them, but that's much less hassle than repairing the missing characters.

Oddly, this happens with both UTF-8 and cp1252 encodings for the input file.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!