Processing a large (>32mb) xml file over appengine

匆匆过客 提交于 2019-12-06 11:07:08

What about storing it in the cloud storage and reading it incrementally, as you can access it line by line (in Python anyway) so it wont' consume all resources.

https://developers.google.com/appengine/docs/python/googlecloudstorageclient/

https://developers.google.com/storage/

The GCS client library lets your application read files from and write files to buckets in Google Cloud Storage (GCS). This library supports reading and writing large amounts of data to GCS, with internal error handling and retries, so you don't have to write your own code to do this. Moreover, it provides read buffering with prefetch so your app can be more efficient.

The GCS client library provides the following functionality:

An open method that returns a file-like buffer on which you can invoke standard Python file operations for reading and writing. A listbucket method for listing the contents of a GCS bucket. A stat method for obtaining metadata about a specific file. A delete method for deleting files from GCS.

I've processed some very large CSV files in exactly this way - read as much as I need to, process, then read some more.

def read_file(self, filename):
    self.response.write('Truncated file content:\n')

    gcs_file = gcs.open(filename)
    self.response.write(gcs_file.readline())
    gcs_file.seek(-1024, os.SEEK_END)
    self.response.write(gcs_file.read())
    gcs_file.close()

Incremental reading with standard python!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!