I\'m uploading a csv/tsv file from a form in GAE, and I try to parse the file with python csv module.
Like describe here, uploaded files in GAE are strings.
So I
How about:
file = self.request.get('catalog')
file = '\n'.join(file.splitlines())
catalog = csv.reader(StringIO.StringIO(file),dialect=csv.excel_tab)
or as pointed out in the comments, csv.reader()
supports input from a list, so:
file = self.request.get('catalog')
catalog = csv.reader(file.splitlines(),dialect=csv.excel_tab)
or if in the future request.get
supports read modes:
file = self.request.get('catalog', 'rU')
catalog = csv.reader(StringIO.StringIO(file),dialect=csv.excel_tab)
The solution described here should work. By defining an iterator class as follows, which loads the blob 1MB at a time, splits the lines using .splitlines() and then feeds lines to the CSV reader one at a time, the newlines can be handled without having to load the whole file into memory.
class BlobIterator:
"""Because the python csv module doesn't like strange newline chars and
the google blob reader cannot be told to open in universal mode, then
we need to read blocks of the blob and 'fix' the newlines as we go"""
def __init__(self, blob_reader):
self.blob_reader = blob_reader
self.last_line = ""
self.line_num = 0
self.lines = []
self.buffer = None
def __iter__(self):
return self
def next(self):
if not self.buffer or len(self.lines) == self.line_num + 1:
self.buffer = self.blob_reader.read(1048576) # 1MB buffer
self.lines = self.buffer.splitlines()
self.line_num = 0
# Handle special case where our block just happens to end on a new line
if self.buffer[-1:] == "\n" or self.buffer[-1:] == "\r":
self.lines.append("")
if not self.buffer:
raise StopIteration
if self.line_num == 0 and len(self.last_line) > 0:
result = self.last_line + self.lines[self.line_num] + "\n"
else:
result = self.lines[self.line_num] + "\n"
self.last_line = self.lines[self.line_num + 1]
self.line_num += 1
return result
Then call this like so:
blob_reader = blobstore.BlobReader(blob_key)
blob_iterator = BlobIterator(blob_reader)
reader = csv.reader(blob_iterator)