PHP - Best approach to detect CSV delimiter

守給你的承諾、 提交于 2019-12-04 04:50:46
nzy

Fixed version.

In your code, if a string has more than 1 delimiter you'll get a wrong result (example: val; string, with comma;val2;val3). Also if a file has 1 row (count of rows < count of delimiters).

Here is a fixed variant:

private function detectDelimiter($fh)
{
    $delimiters = ["\t", ";", "|", ","];
    $data_1 = null; $data_2 = null;
    $delimiter = $delimiters[0];
    foreach($delimiters as $d) {
        $data_1 = fgetcsv($fh, 4096, $d);
        if(sizeof($data_1) > sizeof($data_2)) {
            $delimiter = $d;
            $data_2 = $data_1;
        }
        rewind($fh);
    }

    return $delimiter;
}

In general, you cannot detect the delimiter for a text file. If there are additional hints, you need to implement them in your detection to be sure.

One particular problem with the suggested approach is that it will count the number of elements in different lines of the file. Suppose you had a file like this:

a;b;c;d
a   b;  c   d
this|that;here|there
It's not ready, yet.; We have to wait for peter, paul, and mary.; They will know what to do

Although this seems to be separated by a semicolon, your approach will return comma.

None of this answers my use case. So I made a slight modification.

/**
    * @param string $filePath
    * @param int $checkLines
    * @return string
    */
   public function getCsvDelimiter(string $filePath, int $checkLines = 3): string
   {
      $delimeters =[',', ';', '\t'];

      $default =',';

       $fileObject = new \SplFileObject($filePath);
       $results = [];
       $counter = 0;
       while ($fileObject->valid() && $counter <= $checkLines) {
           $line = $fileObject->fgets();
           foreach ($delimiters as $delimiter) {
               $fields = explode($delimiter, $line);
               $totalFields = count($fields);
               if ($totalFields > 1) {
                   if (!empty($results[$delimiter])) {
                       $results[$delimiter] += $totalFields;
                   } else {
                       $results[$delimiter] = $totalFields;
                   }
               }
           }
           $counter++;
       }
       if (!empty($results)) {
           $results = array_keys($results, max($results));

           return $results[0];
       }
return $default;
}

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!