I\'d like to distinguishing None
and empty strings when going back and forth between Python data structure and csv representation using Python\'s csv
As others have pointed out you can't really do this via csv.Dialect
or parameters to csv.writer
and/or csv.reader
. However as I said in one comment, you implement it by effectively subclassing the latter two (you apparently can't really do because they're built-in). What the "subclasses" do on writing is simply intercept None
values and change them into a unique string and reverse the process when reading them back in. Here's a fully worked-out example:
import csv, cStringIO
NULL = '' # something unlikely to ever appear as a regular value in your csv files
class MyCsvWriter(object):
def __init__(self, *args, **kwrds):
self.csv_writer = csv.writer(*args, **kwrds)
def __getattr__(self, name):
return getattr(self.csv_writer, name)
def writerow(self, row):
self.csv_writer.writerow([item if item is not None else NULL
for item in row])
def writerows(self, rows):
for row in rows:
self.writerow(row)
class MyCsvReader(object):
def __init__(self, *args, **kwrds):
self.csv_reader = csv.reader(*args, **kwrds)
def __getattr__(self, name):
return getattr(self.csv_reader, name)
def __iter__(self):
rows = iter(self.csv_reader)
for row in rows:
yield [item if item != NULL else None for item in row]
data = [['NULL/None value', None],
['empty string', '']]
f = cStringIO.StringIO()
MyCsvWriter(f).writerows(data) # instead of csv.writer(f).writerows(data)
f = cStringIO.StringIO(f.getvalue())
data2 = [e for e in MyCsvReader(f)] # instead of [e for e in csv.reader(f)]
print "input : ", data
print "ouput : ", data2
Output:
input : [['NULL/None value', None], ['empty string', '']]
ouput : [['NULL/None value', None], ['empty string', '']]
It's a tad verbose and probably slows the reading & writing of csv file a bit (since they're written in C/C++) but that may make little difference since the process is likely low-level I/O bound anyway.