how to transfer file to azure blob storage in chunks without writing to file using python

问题

I need to transfer files from google cloud storage to azure blob storage.

Google gives a code snippet to download files to byte variable like so:

# Get Payload Data
req = client.objects().get_media(
        bucket=bucket_name,
        object=object_name,
        generation=generation)    # optional
# The BytesIO object may be replaced with any io.Base instance.
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, req, chunksize=1024*1024)
done = False
while not done:
    status, done = downloader.next_chunk()
    if status:
        print 'Download %d%%.' % int(status.progress() * 100)
    print 'Download Complete!'
print fh.getvalue()

I was able to modify this to store to file by changing the fh object type like so:

fh = open(object_name, 'wb')

Then I can upload to azure blob storage using blob_service.put_block_blob_from_path.

I want to avoid writing to local file on machine doing the transfer.

I gather Google's snippet loads the data into the io.BytesIO() object a chunk at a time. I reckon I should probably use this to write to blob storage a chunk at a time.

I experimented with reading the whole thing into memory, and then uploading using put_block_blob_from_bytes, but I got a memory error (file is probably too big (~600MB).

Any suggestions?

回答1:

According to the source codes of blobservice.py for Azure Storage and BlobReader for Google Cloud Storage, you can try to use the Azure function blobservice.put_block_blob_from_file to write the stream from the GCS class blobreader has the function read as stream, please see below.

So refering to the code from https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_BlobReader, you can try to do this as below.

from google.appengine.ext import blobstore
from azure.storage.blob import BlobService

blob_key = ...
blob_reader = blobstore.BlobReader(blob_key)

blob_service = BlobService(account_name, account_key)
container_name = ...
blob_name = ...
blobservice.put_block_blob_from_file(container_name, blob_name, blob_reader)

回答2:

After looking through the SDK source code, something like this could work:

from azure.storage.blob import _chunking
from azure.storage.blob import BlobService

# See _BlobChunkUploader
class PartialChunkUploader(_chunking._BlockBlobChunkUploader):
    def __init__(self, blob_service, container_name, blob_name, progress_callback = None):
        super(PartialChunkUploader, self).__init__(blob_service, container_name, blob_name, -1, -1, None, False, 5, 1.0, progress_callback, None)

    def process_chunk(self, chunk_offset, chunk_data):
        '''chunk_offset is the integer offset. chunk_data is an array of bytes.'''
        return self._upload_chunk_with_retries(chunk_offset, chunk_data)

blob_service = BlobService(account_name='myaccount', account_key='mykey')

uploader = PartialChunkUploader(blob_service, "container", "foo")
# while (...):
#     uploader.process_chunk(...)

来源：https://stackoverflow.com/questions/35264428/how-to-transfer-file-to-azure-blob-storage-in-chunks-without-writing-to-file-usi

标签

python

azure

google-cloud-storage

azure-storage-blobs