Parsing through a csv file in Qt

后端 未结 6 1850
忘掉有多难
忘掉有多难 2020-12-09 04:42

Is anyone familiar with how to parse through a csv file and put it inside a string list. Right now I am taking the entire csv file and putting into the string list. I am try

6条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-09 05:32

    One might prefer to do it this way:

    QStringList MainWindow::parseCSV(const QString &string)
    {
        enum State {Normal, Quote} state = Normal;
        QStringList fields;
        QString value;
    
        for (int i = 0; i < string.size(); i++)
        {
            const QChar current = string.at(i);
    
            // Normal state
            if (state == Normal)
            {
                // Comma
                if (current == ',')
                {
                    // Save field
                    fields.append(value.trimmed());
                    value.clear();
                }
    
                // Double-quote
                else if (current == '"')
                {
                    state = Quote;
                    value += current;
                }
    
                // Other character
                else
                    value += current;
            }
    
            // In-quote state
            else if (state == Quote)
            {
                // Another double-quote
                if (current == '"')
                {
                    if (i < string.size())
                    {
                        // A double double-quote?
                        if (i+1 < string.size() && string.at(i+1) == '"')
                        {
                            value += '"';
    
                            // Skip a second quote character in a row
                            i++;
                        }
                        else
                        {
                            state = Normal;
                            value += '"';
                        }
                    }
                }
    
                // Other character
                else
                    value += current;
            }
        }
    
        if (!value.isEmpty())
            fields.append(value.trimmed());
    
        // Quotes are left in until here; so when fields are trimmed, only whitespace outside of
        // quotes is removed.  The quotes are removed here.
        for (int i=0; i=1 && fields[i].left(1)=='"')
            {
                fields[i]=fields[i].mid(1);
                if (fields[i].length()>=1 && fields[i].right(1)=='"')
                    fields[i]=fields[i].left(fields[i].length()-1);
            }
    
        return fields;
    }
    
    • Powerful: handles quoted material with commas, double double quotes (which signify a double-quote character) and whitespace right
    • Flexible: doesn't fail if the last quote on the last string is forgotten, and handles more complicated CSV files; lets you process one line at a time without having to read the whole file in memory first
    • Simple: Just drop this state machine in yer code, right-click on the function name in QtCreator and choose Refactor | Add private declaration, and yer good 2 go.
    • Performant: accurately processes CSV lines faster than doing RegEx look-aheads on each character
    • Convenient: requires no external library
    • Easy to read: The code is intuitive, in case U need 2 modify it.

    Edit: I've finally got around to getting this to trim spaces before and after the fields. No whitespace nor commas are trimmed inside quotes. Otherwise, all whitespace is trimmed from the start and end of a field. After puzzling about this for a while, I hit on the idea that the quotes could be left around the field; and so all fields could be trimmed. That way, only whitespace before and after quotes or text is removed. A final step was then added, to strip out quotes for fields that start and end with quotes.

    Here is a more or less challenging test case:

    QStringList sl=
    {
        "\"one\"",
        "  \" two \"\"\"  , \" and a half  ",
        "three  ",
        "\t  four"
    };
    
    for (int i=0; i < sl.size(); ++i)
        qDebug() << parseCSV(sl[i]);
    

    This corresponds to the file

    "one"
     " two """  , " and a half  
    three  
      four
    

    where represents the tab character; and each line is fed into parseCSV() in turn. DON'T write .csv files like this!

    Its output is (where qDebug() is representing quotes in the string with \" and putting things in quotes and parens):

    ("one")
    (" two \"", " and a half")
    ("three")
    ("four")
    

    You can observe that the quote and the extra spaces were preserved inside the quote for item "two". In the malformed case for "and a half", the space before the quote, and those after the last word, were removed; but the others were not. Missing terminal spaces in this routine could be an indication of a missing terminal quote. Quotes in a field that don't start or end it are just treated as part of a string. A quote isn't removed from the end of a field if one doesn't start it. To detect an error here, just check for a field that starts with a quote, but doesn't end with one; and/or one that contains quotes but doesn't start and end with one, in the final loop.

    More than was needed for yer test case, I know; but a solid general answer to the ?, nonetheless - perhaps for others who have found it.

    Adapted from: https://github.com/hnaohiro/qt-csv/blob/master/csv.cpp

提交回复
热议问题