azure-data-factory | 易学教程

Delete nested date folder getdate() < 5 date

阅读更多关于 Delete nested date folder getdate() < 5 date

问题 Folders in DataLake have nested date folder structure 2019 09 29 30 10 01 02 .. .. 20 I have wrote an Datafactory using actives for-each , GetMetaData,IfCondition and Delete { "name": "IterateEachADLSItem", "type": "ForEach", "dependsOn": [ { "activity": "F_SAP", "dependencyConditions": [ "Succeeded" ] } ], "userProperties": [], "typeProperties": { "items": { "value": "@activity('F_SAP').output.value", "type": "Expression" }, "isSequential": false, "activities": [ { "name":

Azure Data Factory v2 portal is slow

阅读更多关于 Azure Data Factory v2 portal is slow

问题 This is a generic question regarding development processes and using Azure Data Factory v2. I am currently using the UI portal to set up and configure pipelines, datasets, activities, triggers, etc. but I am finding the lag time very long. Is this the same for other users ? What is the typical workflow for someone not using the portal ? (There seem to be no nuget packages for v2 and only powershell as the alternative to the UI portal.) 回答1: The publish performance is low when you have large

How to create Azure on demand HD insight Spark cluster using Data Factory

阅读更多关于 How to create Azure on demand HD insight Spark cluster using Data Factory

问题 I am trying to use Azure Data factory to create an on demand HD insight Spark cluster using Hdi Version 3.5. The data factory is refusing to create with an error message HdiVersion:'3.5' is not supported If currently there is no way of creating an on Demand HD insight spark cluster, then what is the other sensible option? It seems very strange to me why Microsoft hasn't added an on Demand HD insight Spark Cluster to the Azure Data factory. 回答1: Here is a full solution, which uses ADF to

How to control data failures in Azure Data Factory Pipelines?

阅读更多关于 How to control data failures in Azure Data Factory Pipelines?

问题 I receive an error from time and time due to incompatible data in my source data set compared to my target data set. I would like to control the action that the pipeline determines based on error types, maybe output or drop those particulate rows, yet completing everything else. Is that possible? Furthermore, is it possible to get a hold of the actual failing line(s) from Data Factory without accessing and searching in the actual source data set in some simple way? Copy activity encountered a

how to output variable to a file?

阅读更多关于 how to output variable to a file?

问题 I've got a Get metadata activity that goes to an sftp server and lists the files: Is it possible to output this list to a file without using a function?? 回答1: Is there a particular reason you have to log the output to a file? If you just call a Get Metadata task in Azure Data Factory it will be logged as part of the pipeline run and default logging anyway. You can then access those logs if required. Alternately, a common pattern I use with Get Metadata task is a For Each loop and then host

How can I get the last day of a month in dynamic content in ADF2?

阅读更多关于 How can I get the last day of a month in dynamic content in ADF2?

问题 I want to get the last day of a month based on the utcnow() timestamp. Instead of "dd" in the expression bellow there should be automatically the last day of the month (28, 30 or 31): @{formatDateTime(adddays(utcnow(),-2), 'yyyy-MM-ddT23:59:59.999')} Thinking that it´s actually august I expect the following result out of the expression: "2019-08-31T23:59:59.999" 回答1: I would recommend the simplest way to do this is store the dates and their respective end of month dates in a table or file (eg

Change connection string Linked Service in Azure Data Factory v2

阅读更多关于 Change connection string Linked Service in Azure Data Factory v2

问题 I am using Azure Data Factory V2 to integrate data from multiple on-premise mySql database. Is it possible to define just one mysql linked service and then modify the connection string (server name, credential, integration runtime) during runtime. My plan is to use lookup activity to read list of connection strings and then use for-each activity to iterate over that list to pull data from each database using copy activity. Is it possible to do such things, preferably using the Azure data

Change connection string Linked Service in Azure Data Factory v2

阅读更多关于 Change connection string Linked Service in Azure Data Factory v2

Azure Data flow taking mins to trigger next pipeline

阅读更多关于 Azure Data flow taking mins to trigger next pipeline

问题 Azure Data factory transferring data in Db in 10 millisecond but the issue I am having is it is waiting for few mins to trigger next pipeline and that ends up with 40 mins all pipelines are taking less than 20 ms to transfer data. But somehow it is waiting a few mins to trigger the next one. I used debug mode as well trigger the ADF using Logic App without debugging mood. Is there any way I can optimize it we want to move from SSIS to Data Flow but having a time issue 40 mins are so much in

How to create a HDInsightOnDemand LinkedService with a script action in Data Factory?

阅读更多关于 How to create a HDInsightOnDemand LinkedService with a script action in Data Factory?

问题 We are creating a DataFactory for running a pySpark job, that uses a HDInsight on demand cluster. The problem is that we need to use additional python dependencies for running this job, such as numpy, that are not installed. We believe that the way of doing so is configuring a Script Action for the HDInsightOnDemandLinkedService, but we cannot find this option on DataFactory or LikedServices. Is there an alternative for automating the HDInsightOnDemand installation of the dependencies? 回答1: