How to read a CSV line with "?

前端 未结 4 1430
刺人心
刺人心 2020-12-18 03:19

A trivial CSV line could be spitted using string split function. But some lines could have \", e.g.

\"good,morning\", 100, 300, \"1998,5,3\"


        
相关标签:
4条回答
  • 2020-12-18 03:40

    There's a csv module in Python, which handles this.

    Edit: This task falls into "build a lexer" category. The standard way to do such tasks is to build a state machine (or use a lexer library/framework that will do it for you.)

    The state machine for this task would probably only need two states:

    • Initial one, where it reads every character except comma and newline as part of field (exception: leading and trailing spaces) , comma as the field separator, newline as record separator. When it encounters an opening quote it goes into
    • read-quoted-field state, where every character (including comma & newline) excluding quote is treated as part of field, a quote not followed by a quote means end of read-quoted-field (back to initial state), a quote followed by a quote is treated as a single quote (escaped quote).

    By the way, your concatenating solution will break on "Field1","Field2" or "Field1"",""Field2".

    0 讨论(0)
  • 2020-12-18 03:44

    The generic implementation detail would be something like this (untested)

    def csvline2fields(line):
        fields = []
        quote = None
        while line.strip():
            line = line.strip()
            if line[0] in ("'", '"'):
                # Find the next quote:
                end = line.find(line[0])
                fields.append(line[1:end])
                # Find the beginning of the next field
                next = line.find(SEPARATOR)
                if next == -1:
                    break
                line = line[next+1:]
                continue
            # find the next separator:
            next = line.find(SEPARATOR)
            fields.append(line[0:next])
            line = line[next+1:]
    
    0 讨论(0)
  • 2020-12-18 04:02

    From python's CSV module:

    reading a normal CSV file:

    import csv
    reader = csv.reader(open("some.csv", "rb"))
    for row in reader:
        print row
    

    Reading a file with an alternate format:

    import csv
    reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
    for row in reader:
        print row
    

    There are some nice usage examples in LinuxJournal.com.

    If you're interested with the details, read "split string at commas respecting quotes when string not in csv format" showing some nice regexen related to this problem, or simply read the csv module source.

    0 讨论(0)
  • 2020-12-18 04:06

    Chapter 4 of The Practice of Programming gave both C and C++ implementations of the CSV parser.

    0 讨论(0)
提交回复
热议问题