PHP - Best approach to detect CSV delimiter

女生的网名这么多〃 提交于 2019-12-06 00:39:01

问题


I have seen multiple threads about what the best solution to auto detect the delimiter for an incoming CSV. Most of them are functions of length between 20 - 30 lines, multiple loops pre-determined list of delimiters, reading the first 5 lines and matching counts e.t.c e.t.c

Here is 1 example

I have just implemented this procedure, with a few modifications. Works brilliantly.

THEN I found the following code:

private function DetectDelimiter($fh)
{
    $data_1 = null;
    $data_2 = null;
    $delimiter = self::$delim_list['comma'];
    foreach(self::$delim_list as $key=>$value)
    {
        $data_1 = fgetcsv($fh, 4096, $value);
        $delimiter = sizeof($data_1) > sizeof($data_2) ? $key : $delimiter;
        $data_2 = $data_1;
    }

    $this->SetDelimiter($delimiter);
    return $delimiter;
}

This to me looks like it's achieving the SAME results, where $delim_list is an array of delimiters as follows:

static protected $delim_list = array('tab'=>"\t", 
                                     'semicolon'=>";", 
                                     'pipe'=>"|", 
                                     'comma'=>",");

Can anyone shed any light as to why I shouldn't do it this simpler way, and why everywhere I look the more convoluted solution seems to be the accepted answer?

Thanks!


回答1:


Fixed version.

In your code, if a string has more than 1 delimiter you'll get a wrong result (example: val; string, with comma;val2;val3). Also if a file has 1 row (count of rows < count of delimiters).

Here is a fixed variant:

private function detectDelimiter($fh)
{
    $delimiters = ["\t", ";", "|", ","];
    $data_1 = null; $data_2 = null;
    $delimiter = $delimiters[0];
    foreach($delimiters as $d) {
        $data_1 = fgetcsv($fh, 4096, $d);
        if(sizeof($data_1) > sizeof($data_2)) {
            $delimiter = $d;
            $data_2 = $data_1;
        }
        rewind($fh);
    }

    return $delimiter;
}



回答2:


In general, you cannot detect the delimiter for a text file. If there are additional hints, you need to implement them in your detection to be sure.

One particular problem with the suggested approach is that it will count the number of elements in different lines of the file. Suppose you had a file like this:

a;b;c;d
a   b;  c   d
this|that;here|there
It's not ready, yet.; We have to wait for peter, paul, and mary.; They will know what to do

Although this seems to be separated by a semicolon, your approach will return comma.




回答3:


None of this answers my use case. So I made a slight modification.

/**
    * @param string $filePath
    * @param int $checkLines
    * @return string
    */
   public function getCsvDelimiter(string $filePath, int $checkLines = 3): string
   {
      $delimeters =[',', ';', '\t'];

      $default =',';

       $fileObject = new \SplFileObject($filePath);
       $results = [];
       $counter = 0;
       while ($fileObject->valid() && $counter <= $checkLines) {
           $line = $fileObject->fgets();
           foreach ($delimiters as $delimiter) {
               $fields = explode($delimiter, $line);
               $totalFields = count($fields);
               if ($totalFields > 1) {
                   if (!empty($results[$delimiter])) {
                       $results[$delimiter] += $totalFields;
                   } else {
                       $results[$delimiter] = $totalFields;
                   }
               }
           }
           $counter++;
       }
       if (!empty($results)) {
           $results = array_keys($results, max($results));

           return $results[0];
       }
return $default;
}



来源:https://stackoverflow.com/questions/26717462/php-best-approach-to-detect-csv-delimiter

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!