Reading really big blobs without downloading them in Google Cloud (streaming?)

亡梦爱人 提交于 2019-12-10 23:17:40

问题


please help!

[+] What I have: A lot of blobs in every bucket. Blobs can vary in size from being less than a Kilo-byte to being lots of Giga-bytes.

[+] What I'm trying to do: I need to be able to either stream the data in those blobs (like a buffer of size 1024 or something like that) or read them by chunks of a certain size in Python. The point is I don't think I can just do a bucket.get_blob() because if the blob was a TeraByte then I wouldn't be able to have it in physical memory.

[+] What I'm really trying to do: parse the information inside the blobs to identify key-words

[+] What I've read: A lot of documentation on how to write to google cloud in chunks and then use compose to stitch it together (not helpful at all)

A lot of documentation on java's pre-fetch functions (needs to be python)

The google cloud API's

If anyone could point me the right direction I would be really grateful! Thanks


回答1:


So a way I have found of doing this is by creating a file-like object in python then using the Google-Cloud API call .download_to_file() with that file-like object.

This in essence streams data. python code looks something like this

def getStream(blob):
    stream = open('myStream','wb', os.O_NONBLOCK)
    streaming = blob.download_to_file(stream)

The os.O_NONBLOCK flag is so I can read while I'm writing to the file. I still haven't tested this with really big files so if anyone knows a better implementation or see's a potential failure with this please comment. Thanks!



来源:https://stackoverflow.com/questions/50380237/reading-really-big-blobs-without-downloading-them-in-google-cloud-streaming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!