fgetcsv() ignores special characters when they are at the beginning of line!

前端 未结 4 1791
误落风尘
误落风尘 2020-12-10 14:14

I have a simple script that accepts a CSV file and reads every row into an array. I then cycle through each column of the first row (in my case it holds the questions of a s

相关标签:
4条回答
  • 2020-12-10 14:42

    This behaviour has a bug report filed for it, but apparently it isn't a bug.

    0 讨论(0)
  • 2020-12-10 14:47

    We saw the same result with LANG set to C, and worked around it by ensuring that such values were wrapped in quotation marks. For example, the line

    a,"a",é,"é",óú,"óú",ó&ú,"ó&ú"
    

    generates the following array when passed through fgetcsv():

    array (
      0 => 'a',
      1 => 'a',
      2 => '',
      3 => 'é',
      4 => '',
      5 => 'óú',
      6 => '&ú',
      7 => 'ó&ú',
    )
    

    Of course, you'll have to escape any quotation marks in the value by doubling them, but that's much less hassle than repairing the missing characters.

    Oddly, this happens with both UTF-8 and cp1252 encodings for the input file.

    0 讨论(0)
  • 2020-12-10 14:48

    Have you already checked out the manual page on fgetcsv? There is nothing talking about that specific problem offhand, but a number of contributions maybe worth looking through if nothing comes up here.

    There's this, for example:

    Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

    Also, seeing as it's always in the beginning of the line, could it be that this is really a hidden line break problem? There's this:

    Note: If PHP is not properly recognizing the line endings when reading files either on or created by a Macintosh computer, enabling the auto_detect_line_endings run-time configuration option may help resolve the problem.

    You may also want to try saving the file with different line endings.

    0 讨论(0)
  • 2020-12-10 14:49

    Are you setting your locale correctly before calling fgetcsv()?

    setlocale(LC_ALL, 'fr_FR.UTF-8');
    

    Otherwise, fgetcsv() is not multi-byte safe.

    Make sure that you set it to something that appears in your list of available locales. On linux (certainly on debian) you can see this by doing

    locale -a
    

    You should get something like...

    C
    en_US.utf8
    POSIX
    

    For UTF8 support pick an encoding with utf8 on the end. If your input is encoded with something else you'll need to use the appropriate locale - but make sure your OS supports it first.

    If you set the locale to a locale which isn't available on your system it won't help you.

    0 讨论(0)
提交回复
热议问题