Upload and parse csv file with “universal newline” in python on Google App Engine

后端 未结 2 2075
再見小時候
再見小時候 2020-12-16 17:48

I\'m uploading a csv/tsv file from a form in GAE, and I try to parse the file with python csv module.

Like describe here, uploaded files in GAE are strings.
So I

相关标签:
2条回答
  • 2020-12-16 18:03

    How about:

    file = self.request.get('catalog')
    file  = '\n'.join(file.splitlines())
    catalog = csv.reader(StringIO.StringIO(file),dialect=csv.excel_tab)
    

    or as pointed out in the comments, csv.reader() supports input from a list, so:

    file = self.request.get('catalog')
    catalog = csv.reader(file.splitlines(),dialect=csv.excel_tab)
    

    or if in the future request.get supports read modes:

    file = self.request.get('catalog', 'rU')
    catalog = csv.reader(StringIO.StringIO(file),dialect=csv.excel_tab)
    
    0 讨论(0)
  • 2020-12-16 18:05

    The solution described here should work. By defining an iterator class as follows, which loads the blob 1MB at a time, splits the lines using .splitlines() and then feeds lines to the CSV reader one at a time, the newlines can be handled without having to load the whole file into memory.

    class BlobIterator:
        """Because the python csv module doesn't like strange newline chars and
        the google blob reader cannot be told to open in universal mode, then
        we need to read blocks of the blob and 'fix' the newlines as we go"""
    
        def __init__(self, blob_reader):
            self.blob_reader = blob_reader
            self.last_line = ""
            self.line_num = 0
            self.lines = []
            self.buffer = None
    
        def __iter__(self):
            return self
    
        def next(self):
            if not self.buffer or len(self.lines) == self.line_num + 1:
                self.buffer = self.blob_reader.read(1048576)  # 1MB buffer
                self.lines = self.buffer.splitlines()
                self.line_num = 0
    
                # Handle special case where our block just happens to end on a new line
                if self.buffer[-1:] == "\n" or self.buffer[-1:] == "\r":
                    self.lines.append("")
    
            if not self.buffer:
                raise StopIteration
    
            if self.line_num == 0 and len(self.last_line) > 0:
                result = self.last_line + self.lines[self.line_num] + "\n"
            else:
                result = self.lines[self.line_num] + "\n"
    
            self.last_line = self.lines[self.line_num + 1]
            self.line_num += 1
    
            return result
    

    Then call this like so:

    blob_reader = blobstore.BlobReader(blob_key)
    blob_iterator = BlobIterator(blob_reader)
    reader = csv.reader(blob_iterator)
    
    0 讨论(0)
提交回复
热议问题