Issues Reading Azure Blob CSV Into Python Pandas DF

假装没事ソ 提交于 2021-01-05 06:38:15

问题


I'm trying to access a csv stored in Azure blob and read it into a pandas dataframe in my python script. But I'm running into issues with imports and actually reading the csv. I'm at least able to see that it exists using my python script, which looks like:

import os, uuid, sys
from io import StringIO
import pandas as pd
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, BlobService

try:  
    global service_client
    storage_account_name = 'ACCOUNT_NAME'
    storage_account_key = 'ACCOUNT_KEY'
    storage_connection_string = 'ACCOUNT_STRING'
    storage_container_name = 'CONTAINER_NAME'
    csv_path = '<PATH_TO>/FILE.csv'

    service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
        "https", storage_account_name), credential=storage_account_key)

    file_system_client = service_client.get_file_system_client(file_system=storage_container_name)

    print('GET PATH(S)')
    paths = file_system_client.get_paths(path=csv_path)
    for path in paths:
        print(path.name + '\n')

    blob_service = BlobService(account_name=storage_account_name, account_key=storage_account_key)
    blobstring = blob_service.get_blob_to_text(storage_container_name,csv_path)
    df = pd.read_csv(StringIO(blobstring))

except Exception as e:
    print(e)

finally:
    print('DONE')

The issue is that I can't correctly read the csv into my pd df. Also, I'm running into the issue of actually using BlobService, as every time I try to run the script, I get the error:

ImportError: cannot import name 'BlobService' from 'azure.storage.blob'

My pip freeze for azure looks like this:

azure-common==1.1.25
azure-core==1.5.0
azure-storage-blob==12.3.1
azure-storage-common==2.1.0
azure-storage-file-datalake==12.0.1

What is it that I'm doing wrong here?


回答1:


According to the code you provide, you use class BlobService to download file from Azure blob storage. The class is in the sdk azure.storage 0.20.0. But you install sdk azure.storage.blob. So you will get the error. Since you have installed sdk azure.storage.blob, we can the class BlobClient to download blob.

For example

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
#download csv file from Azure blob
sas_url = "your blob sas url"
blob_client = BlobClient.from_blob_url(sas_url)
downloaded_blo = blob_client.download_blob()

#read csv file
import pandas as pd
df = pd.read_csv(StringIO(downloaded_blob.content_as_text()) )




来源:https://stackoverflow.com/questions/61935564/issues-reading-azure-blob-csv-into-python-pandas-df

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!