I have a simple script that accepts a CSV file and reads every row into an array. I then cycle through each column of the first row (in my case it holds the questions of a s
This behaviour has a bug report filed for it, but apparently it isn't a bug.
We saw the same result with LANG
set to C
, and worked around it by ensuring that such values were wrapped in quotation marks. For example, the line
a,"a",é,"é",óú,"óú",ó&ú,"ó&ú"
generates the following array when passed through fgetcsv()
:
array (
0 => 'a',
1 => 'a',
2 => '',
3 => 'é',
4 => '',
5 => 'óú',
6 => '&ú',
7 => 'ó&ú',
)
Of course, you'll have to escape any quotation marks in the value by doubling them, but that's much less hassle than repairing the missing characters.
Oddly, this happens with both UTF-8 and cp1252 encodings for the input file.
Have you already checked out the manual page on fgetcsv? There is nothing talking about that specific problem offhand, but a number of contributions maybe worth looking through if nothing comes up here.
There's this, for example:
Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.
Also, seeing as it's always in the beginning of the line, could it be that this is really a hidden line break problem? There's this:
Note: If PHP is not properly recognizing the line endings when reading files either on or created by a Macintosh computer, enabling the auto_detect_line_endings run-time configuration option may help resolve the problem.
You may also want to try saving the file with different line endings.
Are you setting your locale correctly before calling fgetcsv()
?
setlocale(LC_ALL, 'fr_FR.UTF-8');
Otherwise, fgetcsv()
is not multi-byte safe.
Make sure that you set it to something that appears in your list of available locales. On linux (certainly on debian) you can see this by doing
locale -a
You should get something like...
C
en_US.utf8
POSIX
For UTF8 support pick an encoding with utf8 on the end. If your input is encoded with something else you'll need to use the appropriate locale - but make sure your OS supports it first.
If you set the locale to a locale which isn't available on your system it won't help you.