You may or may not be aware of ASCII delimited text, which has the nice advantage of using non-keyboard characters for separating fields and lines.
Writing this out
The documentation says:
The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future.
So the csv
module cannot read CSV files that use custom line terminators.
Hey I was struggling with a similar problem all day. I wrote a function heavily inspired by @martineau that should solve it for you. My function is slower but can parse files delimited by any kind of string. Hope it helps!
import csv
def custom_CSV_reader(csv_file,row_delimiter,col_delimiter):
with open(csv_file, 'rb') as f:
row = [];
result = [];
temp_row = ''
temp_col = ''
line = ''
go = 1;
while go == 1:
while go == 1:
ch = f.read(1)
if ch == '': # end of file?
go = 0
if ch != '\n' and ch != '\t' and ch != ',':
temp_row = temp_row + ch
temp_col = temp_col + ch
line = line + ch
if row_delimiter in temp_row:
line = line[:-len(row_delimiter)]
row.append(line)
temp_row = ''
line= ''
break
elif col_delimiter in temp_col:
line = line[:-len(col_delimiter)]
row.append(line)
result.append(row)
row = [];
temp_col = ''
line = ''
break
return result
Per the docs for open:
newline controls how universal newlines mode works (it only applies to text mode). It can be
None
,''
,'\n'
,'\r'
, and'\r\n'
.
so open
won't handle your file. Per the csv docs:
Note The
reader
is hard-coded to recognise either'\r'
or'\n'
as end-of-line, and ignores lineterminator.
so that won't do it either. I also looked into whether str.splitlines was configurable, but it uses a defined set of boundaries.
Am I relegated to doing
line.split(chr(30))
?
Looks that way, sorry!
You can do it by effectively translating the end-of-line characters in the file into the newline characters csv.reader
is hardcoded to recognize:
import csv
with open('ascii_delim.adt', 'w') as f:
writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))
def readlines(f, newline='\n'):
while True:
line = []
while True:
ch = f.read(1)
if ch == '': # end of file?
return
elif ch == newline: # end of line?
line.append('\n')
break
line.append(ch)
yield ''.join(line)
with open('ascii_delim.adt', 'rb') as f:
reader = csv.reader(readlines(f, newline=chr(30)), delimiter=chr(31))
for row in reader:
print row
Output:
['Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue']
['Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!']