UTF-8 problems while reading CSV file with fgetcsv

前端 未结 6 1270
[愿得一人]
[愿得一人] 2020-12-02 22:45

I try to read a CSV and echo the content. But the content displays the characters wrong.

Mäx Müstermänn -> Mäx Müstermänn

Encoding of the CSV file is UT

6条回答
  •  旧时难觅i
    2020-12-02 23:07

    In my case the source file has windows-1250 encoding and iconv prints tons of notices about illegal characters in input string...

    So this solution helped me a lot:

    /**
     * getting CSV array with UTF-8 encoding
     *
     * @param   resource    &$handle
     * @param   integer     $length
     * @param   string      $separator
     *
     * @return  array|false
     */
    private function fgetcsvUTF8(&$handle, $length, $separator = ';')
    {
        if (($buffer = fgets($handle, $length)) !== false)
        {
            $buffer = $this->autoUTF($buffer);
            return str_getcsv($buffer, $separator);
        }
        return false;
    }
    
    /**
     * automatic convertion windows-1250 and iso-8859-2 info utf-8 string
     *
     * @param   string  $s
     *
     * @return  string
     */
    private function autoUTF($s)
    {
        // detect UTF-8
        if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s))
            return $s;
    
        // detect WINDOWS-1250
        if (preg_match('#[\x7F-\x9F\xBC]#', $s))
            return iconv('WINDOWS-1250', 'UTF-8', $s);
    
        // assume ISO-8859-2
        return iconv('ISO-8859-2', 'UTF-8', $s);
    }
    

    Response to @manvel's answer - use str_getcsv instead of explode - because of cases like this:

    some;nice;value;"and;here;comes;combinated;value";and;some;others
    

    explode will explode string into parts:

    some
    nice
    value
    "and
    here
    comes
    combinated
    value"
    and
    some
    others
    

    but str_getcsv will explode string into parts:

    some
    nice
    value
    and;here;comes;combinated;value
    and
    some
    others
    

提交回复
热议问题