CsvHelper : How to detect the Delimiter from the given csv file

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-10 13:58:44

问题


I am using CsvHelper to read/writer the data into Csv file. Now I want to parse the delimiter of the csv file. How can I get this please?

My code:

     var parser = new CsvParser(txtReader);
     delimiter = parser.Configuration.Delimiter;

I always got delimiter is "," but actually in the csv file the delimiter is "\t".


回答1:


CSV is Comma Separated Values. I don't think you can reliably detect if there is a different character used a separator. If there is a header row, then you might be able to count on it.

You should know the separator that is used. You should be able to see it when opening the file. If the source of the files gives you a different separator each time and is not reliable, then I'm sorry. ;)

If you just want to parse using a different delimiter, then you can set csv.Configuration.Delimiter. http://joshclose.github.io/CsvHelper/#configuration-delimiter




回答2:


I found this piece of code in this site

public static char Detect(TextReader reader, int rowCount, IList<char> separators)
{
    IList<int> separatorsCount = new int[separators.Count];

    int character;

    int row = 0;

    bool quoted = false;
    bool firstChar = true;

    while (row < rowCount)
    {
        character = reader.Read();

        switch (character)
        {
            case '"':
                if (quoted)
                {
                    if (reader.Peek() != '"') // Value is quoted and 
            // current character is " and next character is not ".
                        quoted = false;
                    else
                        reader.Read(); // Value is quoted and current and 
                // next characters are "" - read (skip) peeked qoute.
                }
                else
                {
                    if (firstChar)  // Set value as quoted only if this quote is the 
                // first char in the value.
                        quoted = true;
                }
                break;
            case '\n':
                if (!quoted)
                {
                    ++row;
                    firstChar = true;
                    continue;
                }
                break;
            case -1:
                row = rowCount;
                break;
            default:
                if (!quoted)
                {
                    int index = separators.IndexOf((char)character);
                    if (index != -1)
                    {
                        ++separatorsCount[index];
                        firstChar = true;
                        continue;
                    }
                }
                break;
        }

        if (firstChar)
            firstChar = false;
    }

    int maxCount = separatorsCount.Max();

    return maxCount == 0 ? '\0' : separators[separatorsCount.IndexOf(maxCount)];
}

With separators is the possible separators that you can have.

Hope that help :)




回答3:


Since I had to deal with the possibility that, depending on the localization settings of the user, the CSV file (Saved in MS Excel) could contain a different delimiter, I ended up with the following approach :

public static string DetectDelimiter(StreamReader reader)
{
    // assume one of following delimiters
    var possibleDelimiters =  new List<string> {",",";","\t","|"};

    var headerLine = reader.ReadLine();

    // reset the reader to initial position for outside reuse
    // Eg. Csv helper won't find header line, because it has been read in the Reader
    reader.BaseStream.Position = 0;
    reader.DiscardBufferedData();

    foreach (var possibleDelimiter in possibleDelimiters)
    {
        if (headerLine.Contains(possibleDelimiter))
        {
            return possibleDelimiter;
        }
    }

    return possibleDelimiters[0];
}

I also needed to reset the reader's read position, since it was the same instance I used In the CsvReader constructor.

The usage was then as follows:

using (var textReader = new StreamReader(memoryStream))
{
    var delimiter = DetectDelimiter(textReader);

    using (var csv = new CsvReader(textReader))
    {
        csv.Configuration.Delimiter = delimiter;

        ... rest of the csv reader process

    }
}


来源:https://stackoverflow.com/questions/33341307/csvhelper-how-to-detect-the-delimiter-from-the-given-csv-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!