Using csv module to read ascii delimited text?

戏子无情 提交于 2019-12-18 09:03:35

问题


You may or may not be aware of ASCII delimited text, which has the nice advantage of using non-keyboard characters for separating fields and lines.

Writing this out is pretty easy:

import csv

with open('ascii_delim.adt', 'w') as f:
    writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
    writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
    writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))

And, sure enough, you get things dumped out properly. However, on reading, lineterminator does nothing, and if I try to do:

open('ascii_delim.adt', newline=chr(30))

It throws a ValueError: illegal newline value:

So how can I read in my ASCII delimited file? Am I relegated to doing line.split(chr(30))?


回答1:


You can do it by effectively translating the end-of-line characters in the file into the newline characters csv.reader is hardcoded to recognize:

import csv

with open('ascii_delim.adt', 'w') as f:
    writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
    writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
    writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))

def readlines(f, newline='\n'):
    while True:
        line = []
        while True:
            ch = f.read(1)
            if ch == '':  # end of file?
                return
            elif ch == newline:  # end of line?
                line.append('\n')
                break
            line.append(ch)
        yield ''.join(line)

with open('ascii_delim.adt', 'rb') as f:
    reader = csv.reader(readlines(f, newline=chr(30)), delimiter=chr(31))
    for row in reader:
        print row

Output:

['Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue']
['Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!']



回答2:


The documentation says:

The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future.

So the csv module cannot read CSV files that use custom line terminators.




回答3:


Hey I was struggling with a similar problem all day. I wrote a function heavily inspired by @martineau that should solve it for you. My function is slower but can parse files delimited by any kind of string. Hope it helps!

import csv

def custom_CSV_reader(csv_file,row_delimiter,col_delimiter):

    with open(csv_file, 'rb') as f:

        row = [];
        result = [];
        temp_row = ''
        temp_col = ''
        line = ''
        go = 1;

        while go == 1:
            while go == 1:
                ch = f.read(1)

                if ch == '':  # end of file?
                    go = 0

                if ch != '\n' and ch != '\t' and ch != ',':
                    temp_row = temp_row + ch
                    temp_col = temp_col + ch
                    line = line + ch

                if row_delimiter in temp_row:
                    line = line[:-len(row_delimiter)]

                    row.append(line)

                    temp_row = ''
                    line= ''

                    break

                elif col_delimiter in temp_col:
                    line = line[:-len(col_delimiter)]
                    row.append(line)
                    result.append(row)

                    row = [];
                    temp_col = ''
                    line = ''
                    break
    return result



回答4:


Per the docs for open:

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'.

so open won't handle your file. Per the csv docs:

Note The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator.

so that won't do it either. I also looked into whether str.splitlines was configurable, but it uses a defined set of boundaries.

Am I relegated to doing line.split(chr(30))?

Looks that way, sorry!



来源:https://stackoverflow.com/questions/30224364/using-csv-module-to-read-ascii-delimited-text

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!