Method to put alerts on long running azure data factory pipeline

笑着哭i 提交于 2020-06-23 08:31:26

问题


I have some data factory pipelines which may sometimes run beyond 2 hours when copying data from blob into SQL. The time period is variable, but I'd like to be notified/alerted when any pipeline runs beyond 2 hours.

What are possible ways of doing this?

What I have tried so far:

  • Explored the adf metrics on which I can put an alert rule. But there seems to be none which talks about active run's duration.
  • I was hoping to get Pipeline's duration value as we see it on the monitor tab in adf.azure.com and use this to put some sort of alert.
  • I was also thinking if I can get pipeline start time then maybe i can calculate from current time the total run time and put some alert on top of that.


回答1:


We do something like this to track running pipelines and manage execution concurrency. I find Logic Apps and Azure Functions great tools for creating these kinds of solutions. Here is a rough outline of how we handle this:

  1. A set of Azure Functions (AF) that leverage the Microsoft.Azure.Management.DataFactory SDK. The relevant code is at the bottom of this post.
  2. A log of pipeline executions in a SQL Server table. The table includes the PipelineId and Status, and some other information. You would need to INSERT to this table whenever you create a pipeline. We use a separate Logic App that calls an AF to execute the pipeline using the "RunPipelineAsync" method in the code below, capture the new PipelineId (RunId), and send it to a Stored Procedure to log the PipelineId.
  3. A Logic App running on a recurrence trigger(every 3 minutes) that a) calls a Stored Procedure that polls the table (#2 above) and returns all pipelines with Status = "InProgress"; b) foreach over the returned list and call an AF (#1 above) that checks the current status of the pipeline using the "GetPipelineInfoAsync" method in the code below; and c) calls another Stored Procedure to update the status in the table.

You could do something similar to this and use the "DurationInMS" to generate appropriate actions based on status = "InProgress" and total running time > {desired alert threshold}.

Here is the DataFactoryHelper class I use:

using Microsoft.IdentityModel.Clients.ActiveDirectory;
using Microsoft.Rest;
using Microsoft.Azure.Management.ResourceManager;
using Microsoft.Azure.Management.DataFactory;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace AzureUtilities.DataFactory
{
    public class DataFactoryHelper
    {
        private ClientCredential Credentials { get; set; }
        private string KeyVaultUrl { get; set; }
        private string TenantId { get; set; }
        private string SubscriptionId { get; set; }

        private DataFactoryManagementClient _client = null;
        private DataFactoryManagementClient Client
        {
            get {
                if (_client == null)
                {
                    var context = new AuthenticationContext("https://login.windows.net/" + TenantId);
                    AuthenticationResult result = context.AcquireTokenAsync("https://management.azure.com/", Credentials).Result;
                    ServiceClientCredentials cred = new TokenCredentials(result.AccessToken);
                    _client = new DataFactoryManagementClient(cred) { SubscriptionId = SubscriptionId };
                }

                return _client;
            }
        }

        public DataFactoryHelper(string servicePrincipalId, string servicePrincipalKey, string tenantId, string subscriptionId)
        {
            Credentials = new ClientCredential(servicePrincipalId, servicePrincipalKey);
            TenantId = tenantId;
            SubscriptionId = subscriptionId;
        }

        public async Task<string> RunPipelineAsync(string resourceGroupName,
                                                   string dataFactoryName,
                                                   string pipelineName,
                                                   Dictionary<string, object> parameters = null,
                                                   Dictionary<string, List<string>> customHeaders = null)
        {
            var runResponse = await Client.Pipelines.CreateRunWithHttpMessagesAsync(resourceGroupName, dataFactoryName, pipelineName, parameters: parameters , customHeaders: customHeaders);
            return runResponse.Body.RunId;
        }

        public async Task<object> GetPipelineInfoAsync(string resourceGroup, string dataFactory, string runId)
        {
            var info = await Client.PipelineRuns.GetAsync(resourceGroup, dataFactory, runId);
            return new
            {
                RunId = info.RunId,
                PipelineName = info.PipelineName,
                InvokedBy = info.InvokedBy.Name,
                LastUpdated = info.LastUpdated,
                RunStart = info.RunStart,
                RunEnd = info.RunEnd,
                DurationInMs = info.DurationInMs,
                Status = info.Status,
                Message = info.Message
            };
        }
    }
}



回答2:


One way of doing it work-around wise would be to log a timestamp in your SQL database as a first step in your pipeline and then keep track of the load by monitoring the sessions in your database engine.



来源:https://stackoverflow.com/questions/59085000/method-to-put-alerts-on-long-running-azure-data-factory-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!