How to connect AMLS to ADLS Gen 2?

戏子无情 提交于 2020-12-11 02:33:07

问题


I would like to register a dataset from ADLS Gen2 in my Azure Machine Learning workspace (azureml-core==1.12.0). Given that service principal information is not required in the Python SDK documentation for .register_azure_data_lake_gen2(), I successfully used the following code to register ADLS gen2 as a datastore:

from azureml.core import Datastore

adlsgen2_datastore_name = os.environ['adlsgen2_datastore_name']
account_name=os.environ['account_name'] # ADLS Gen2 account name
file_system=os.environ['filesystem']

adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(
    workspace=ws,
    datastore_name=adlsgen2_datastore_name,
    account_name=account_name, 
    filesystem=file_system
)

However, when I try to register a dataset, using

from azureml.core import Dataset
adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
data = Dataset.Tabular.from_delimited_files((adls_ds, 'folder/data.csv'))

I get an error

Cannot load any data from the specified path. Make sure the path is accessible and contains data. ScriptExecutionException was caused by StreamAccessException. StreamAccessException was caused by AuthenticationException. 'AdlsGen2-ReadHeaders' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID <CLIENT_REQUEST_ID>, request ID <REQUEST_ID>. Error message: [REDACTED] | session_id=<SESSION_ID>

Do I need the to enable the service principal to get this to work? Using the ML Studio UI, it appears that the service principal is required even to register the datastore.

Another issue I noticed is that AMLS is trying to access the dataset here: https://adls_gen2_account_name.**dfs**.core.windows.net/container/folder/data.csv whereas the actual URI in ADLS Gen2 is: https://adls_gen2_account_name.**blob**.core.windows.net/container/folder/data.csv


回答1:


According to this documentation,you need to enable the service principal.

1.you need to register your application and grant the service principal with Storage Blob Data Reader access.

2.try this code:

adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
                                                             datastore_name=adlsgen2_datastore_name,
                                                             account_name=account_name,
                                                             filesystem=file_system,
                                                             tenant_id=tenant_id,
                                                             client_id=client_id,
                                                             client_secret=client_secret
                                                             )

adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
dataset = Dataset.Tabular.from_delimited_files((adls_ds,'sample.csv'))
print(dataset.to_pandas_dataframe())

Result:



来源:https://stackoverflow.com/questions/63891547/how-to-connect-amls-to-adls-gen-2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!