numpy.load from io.BytesIO stream

问题

I have numpy arrays saved in Azure Blob Storage, and I'm loading them to a stream like this:

stream = io.BytesIO()
store.get_blob_to_stream(container, 'cat.npy', stream)

I know from stream.getvalue() that the stream contains the metadata to reconstruct the array. This is the first 150 bytes:

b"\x93NUMPY\x01\x00v\x00{'descr': '|u1', 'fortran_order': False, 'shape': (720, 1280, 3), }                                                  \n\xc1\xb0\x94\xc2\xb1\x95\xc3\xb2\x96\xc4\xb3\x97\xc5\xb4\x98\xc6\xb5\x99\xc7\xb6\x9a\xc7"

Is it possible to load the bytes stream with numpy.load or by some other simple method?

I could instead save the array to disk and load it from disk, but I'd like to avoid that for several reasons...

EDIT: just to emphasize, the output would need to be a numpy array with the shape and dtype specified in the 128 first bytes of the stream.

回答1:

I tried to use several ways to realize your needs.

Here is my sample codes.

from azure.storage.blob.baseblobservice import BaseBlobService
import numpy as np

account_name = '<your account name>'
account_key = '<your account key>'
container_name = '<your container name>'
blob_name = '<your blob name>'

blob_service = BaseBlobService(
    account_name=account_name,
    account_key=account_key
)

Sample 1. To generate a blob url with sas token to get the content via requests

from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta
import requests

sas_token = blob_service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
print(sas_token)
url_with_sas = blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)
print(url_with_sas)

r = requests.get(url_with_sas)
dat = np.frombuffer(r.content)
print('from requests', dat)

Sample 2. To download the content of blob into memory via BytesIO

import io
stream = io.BytesIO()
blob_service.get_blob_to_stream(container_name, blob_name, stream)
dat = np.frombuffer(stream.getbuffer())
print('from BytesIO', dat)

Sample 3. Use numpy.fromfile with DataSource to open a blob url with sas token, it will actually download blob file into local filesystem.

ds = np.DataSource()
# ds = np.DataSource(None)  # use with temporary file
# ds = np.DataSource(path) # use with path like `data/`
f = ds.open(url_with_sas)
dat = np.fromfile(f)
print('from DataSource', dat)

I think Samples 1 & 2 are better for you.

回答2:

This is a bit of a hacky way I came up with, which basically just gets the metadata from the first 128 bytes:

def load_npy_from_stream(stream_):
    """Experimental, may not work!

    :param stream_: io.BytesIO() object obtained by e.g. calling BlockBlobService().get_blob_to_stream() containing
        the binary stream of a standard format .npy file.
    :return: numpy.ndarray
    """
    stream_.seek(0)
    prefix_ = stream_.read(128)  # first 128 bytes seem to be the metadata
    dict_string = re.search('\{(.*?)\}', prefix_[1:].decode())[0]
    metadata_dict = eval(dict_string)

    array = np.frombuffer(stream_.read(), dtype=metadata_dict['descr']).reshape(metadata_dict['shape'])

    return array

Could fail in numerous ways, but I'm posting it here if anyone wants to give it a shot. I'll be running tests with this and will get back as I know more.

来源：https://stackoverflow.com/questions/55610891/numpy-load-from-io-bytesio-stream

标签

python

numpy

azure-storage