how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.
I used @Jay Bhatt's solution for finding out a csv file's delimiter, but it didn't work for me, so I applied a few fixes and comments for the process to be more understandable.
See my version of @Jay Bhatt's function:
function decide_csv_delimiter($file, $checkLines = 10) {
// use php's built in file parser class for validating the csv or txt file
$file = new SplFileObject($file);
// array of predefined delimiters. Add any more delimiters if you wish
$delimiters = array(',', '\t', ';', '|', ':');
// store all the occurences of each delimiter in an associative array
$number_of_delimiter_occurences = array();
$results = array();
$i = 0; // using 'i' for counting the number of actual row parsed
while ($file->valid() && $i <= $checkLines) {
$line = $file->fgets();
foreach ($delimiters as $idx => $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
// construct the array with all the keys as the delimiters
// and the values as the number of delimiter occurences
$number_of_delimiter_occurences[$delimiter] = count($fields);
}
$i++;
}
// get key of the largest value from the array (comapring only the array values)
// in our case, the array keys are the delimiters
$results = array_keys($number_of_delimiter_occurences, max($number_of_delimiter_occurences));
// in case the delimiter happens to be a 'tab' character ('\t'), return it in double quotes
// otherwise when using as delimiter it will give an error,
// because it is not recognised as a special character for 'tab' key,
// it shows up like a simple string composed of '\' and 't' characters, which is not accepted when parsing csv files
return $results[0] == '\t' ? "\t" : $results[0];
}
I personally use this function for helping automatically parse a file with PHPExcel, and it works beautifully and fast.
I recommend parsing at least 10 lines, for the results to be more accurate. I personally use it with 100 lines, and it is working fast, no delays or lags. The more lines you parse, the more accurate the result gets.
NOTE: This is just a modifed version of @Jay Bhatt's solution to the question. All credits goes to @Jay Bhatt.