Azure function binding for Azure data lake (python)

你说的曾经没有我的故事 提交于 2021-01-29 09:19:26

问题


I am having a requirement like I want to connect to my Azure data lake v2(ADLS) from Azure functions, read file, process it using python(pyspark) and write it again in Azure data lake. So my input and output binding would be to ADLS. Is there any ADLS binding for Azure function in python available? Could somebody give any suggestions on this?

Thank, Anten D


回答1:


Update:

1, When we read the data, we can use blob input binding.

2, But when we write the data, we can not use blob output binding.(This is because the object is different.) And azure function not support ADLS output binding so we need to put the logic code in the body of the function when we want to write the code.

This is the doc of what kind of binding that azure function can support:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings?tabs=csharp#supported-bindings

Below is a simply code example:

import logging

import azure.functions as func
from azure.storage.filedatalake import DataLakeServiceClient

def main(req: func.HttpRequest, inputblob: func.InputStream) -> func.HttpResponse:
    connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
    datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
    myfilesystem = "test"
    myfile       = "FileName.txt"
    file_system_client = datalake_service_client.get_file_system_client(myfilesystem)    
    file_client = file_system_client.create_file(myfile)
    inputstr = inputblob.read().decode("utf-8")
    print("length of data is "+str(len(inputstr)))
    filesize_previous = 0
    print("length of currentfile is "+str(filesize_previous))
    file_client.append_data(inputstr, offset=filesize_previous, length=len(inputstr))
    file_client.flush_data(filesize_previous+len(inputstr))
    return func.HttpResponse(
            "This is a test."+inputstr,
            status_code=200
    )

Original Answer:

I think below doc will helps you:

How to read:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-input?tabs=csharp

How to write:

https://docs.microsoft.com/en-us/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python

By the way, don't use blob's output binding. Reading can be achieved with binding, but writing cannot.(Blob Storage Service and Datalake Service are based on different objects. Although using blob input binding to read files is completely fine, please do not use blob output binding to write files, because it does not create an object based on Datalake Service.)

Let me know whether above doc can helps you, if not I will update a simple python example.



来源:https://stackoverflow.com/questions/64527808/azure-function-binding-for-azure-data-lake-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!