How to integrate a WebJob within an Azure Data Factory Pipeline

微笑、不失礼 提交于 2019-12-01 01:51:18

Dec 2018 Update :

If you are thinking of doing this using azure function, azure data factory NOW provides you with an azure function step! the underlying principle is the same as you will have to expose the azure function with a HTTP trigger. however this provides better security since you can specify your data factory instance access to the azure function using ACL

Reference : https://azure.microsoft.com/en-us/blog/azure-functions-now-supported-as-a-step-in-azure-data-factory-pipelines/


Orginal Answer

  • From the comments posted I believe you dont want to use custom activities route.
  • You could try using a copy task for this, even though probably this is not the intended purpose.
  • there is a httpConnector available for copying data from a web source.

https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-http-connector

  • the copy task triggers an http endpoint,
  • you can specify a variety of authentication mechanisms from Basic to OAuth2.
  • below I am using the end point to trigger the azure function process, the output is saved in datalake folder for logging (you can use other things obviously, like in your case it would be blob storage.)

Basic linked Service

{
  "name": "linkedservice-httpEndpoint",
  "properties": {
    "type": "Http",
    "typeProperties": {
      "url": "https://azurefunction.api.com/",
      "authenticationType": "Anonymous"
    }
  }
}

Basic Input Dataset

{
  "name": "Http-Request",
  "properties": {
    "type": "Http",
    "linkedServiceName": "linkedservice-httpEndpoint",
    "availability": {
      "frequency": "Minute",
      "interval": 30
    },
    "typeProperties": {
      "relativeUrl": "/api/status",
      "requestMethod": "Get",
      "format": {
        "type": "TextFormat",
        "columnDelimiter": ","
      }
    },
    "structure": [
      {
        "name": "Status",
        "type": "String"
      }
    ],
    "published": false,
    "external": true,
    "policy": {}
  }
}

Output

{
    "name": "Http-Response",
    "properties": {
        "structure": [
            ...
        ],
        "published": false,
        "type": "AzureDataLakeStore",
        "linkedServiceName": "linkedservice-dataLake",
        "typeProperties": {
          ...
        },
        "availability": {
            ...
        },
        "external": false,
        "policy": {}
    }
}

Activity

{
        "type": "Copy",
        "name": "Trigger Azure Function or WebJob with Http Trigger",
        "scheduler": {
          "frequency": "Day",
          "interval": 1
        },
        "typeProperties": {
          "source": {
            "type": "HttpSource",
            "recursive": false
          },
          "sink": {
            "type": "AzureDataLakeStoreSink",
            "copyBehavior": "MergeFiles",
            "writeBatchSize": 0,
            "writeBatchTimeout": "00:00:00"
          }
        },
        "inputs": [
          {
            "name": "Http-Request"
          }
        ],
        "outputs": [
          {
            "name": "Http-Response"
          }
        ],
        "policy": {
          ...
        }        
      }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!