azure-data-factory-2

Data Factory - append fields to JSON sink

被刻印的时光 ゝ 提交于 2019-12-11 15:18:04
问题 I am using the copy activity to copy/transform a JSON source dataset into JSON sink dataset. Need to append a few audit fields to the output - such as transform date using @utcnow expression function. How can this be accomplished? 回答1: It looks like the databricks activity handles this functionality pretty well. df_new = df.select("table.field1","table.field2","table.field3").withColumn("TransferDate", current_timestamp()) 来源: https://stackoverflow.com/questions/51883988/data-factory-append

Azure Data Factory Source Dataset value from Parameter

对着背影说爱祢 提交于 2019-12-11 13:54:55
问题 I have a Dataset in Azure Datafactory backed by a CSV file. I added an additional column in Dataset and want to pass it's value from Dataset parameter but value never gets copied to the column "type": "AzureBlob", "structure": [ { "name": "MyField", "type": "String" } ] I have a defined parameter as well "parameters": { "MyParameter": { "type": "String", "defaultValue": "ABC" } } How can copy the parameter value to Column? I tried following but doesn't work "type": "AzureBlob", "structure": [

Web activity throws overlimit error when calling rest api

柔情痞子 提交于 2019-12-11 13:33:45
问题 My ADF pipeline has a lookup activity which uses a sql query to get data from a table and passes it to a web activity which posts the JSON to an API (azure app service). When the query gets 1000 rows it works fine but when I try over 5000 rows the web activity returns the error. "errorCode": "2001", "message": "The length of execution ouput is over limit (around 1M currently). ", "failureType": "UserError", When I post the 5000 rows to the API using postman it works fine. Any idea what this

Azure Data Factory: How to trigger a pipeline after another pipeline completed successfully

随声附和 提交于 2019-12-11 04:24:56
问题 In Azure Data Factory, how do I trigger a pipeline after other pipelines completed successfully? In detail: I seek to trigger an SSIS package after other pipelines completed successfully. I already know I can save my SSIS package as a pipeline and run it using a trigger like the other pipelines. But how do I make sure the SSIS package pipeline starts only after the other pipelines are finished? Is there a feature for this in Azure or do I need some kind of workaround for this? Thanks in

Azure Data Factory activity copy: Evaluate column in sink table with @pipeline().TriggerTime

江枫思渺然 提交于 2019-12-11 03:15:39
问题 With Data Factory V2 I'm trying to implement a stream of data copy from one Azure SQL database to another. I have mapped all the columns of the source table with the sink table but in the sink table I have an empty column where I would like to enter the pipeline run time. Does anyone know how to fill this column in the sink table without it being present in the source table? Below there is the code of my copy pipeline { "name": "FLD_Item_base", "properties": { "activities": [ { "name": "Copy

Pre-copy script in data factory or on the fly data processing

一曲冷凌霜 提交于 2019-12-11 01:47:36
问题 I am copying data from a source, an API, and copying it into Azure SQL DB. But in one of the column I am getting Json objects. Any way i can use dynamic parameters (either through Pre-copy script or something else) in the pipeline to only take value of a particular tag from those json objects so that i can have only that value in the column. Only constraint is that I can't change the sink. It has to be Azure SQL DB. Json object I am getting: [{"self":"https://xxxxxxxx.jira.com/rest/api/2

Enumerate blob names in Azure Data Factory v2

蹲街弑〆低调 提交于 2019-12-10 11:34:38
问题 I need to enumerate all the blob names that sit in an Azure Blobs container and dump the list to a file in another blob storage. The part that I cannot master is the enumeration. Thanks. 回答1: Get metadata activity is what you want. https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity Please use childItems to get all the files. And then use a foreach to iterate the childItems Inside the for each activity, you may want to check if each item is a file. You could

Copy Data From Azure Blob Storage to AWS S3

别等时光非礼了梦想. 提交于 2019-12-10 09:56:10
问题 I am new to Azure Data Factory and have an interesting requirement. I need to move files from Azure Blob storage to Amazon S3, ideally using Azure Data Factory. However S3 isnt supported as a sink; https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-overview I also understand from a variety of comments i've read on here that you cannot directly copy from Blob Storage to S3 - you would need to download the file locally and then upload it to S3. Does anyone know of any examples,

Release pipeline conflict with integration runtime

时光总嘲笑我的痴心妄想 提交于 2019-12-08 08:39:40
问题 This question relates to how to propagate a data factory through CI (in VSTS) if there is a self hosted Integration Runtime defined in the Data Factory. I have a 3 environments set up - Dev / UAT / Prod each with their own data factory. The Dev hosts the master collaboration branch. I am using VSTS to retrieve the artifacts from the adf_publish branch and deploying the template to UAT (prod will be done later). I followed much of what is in this guide here. When deploying to blank UAT with a

Azure Data flow taking mins to trigger next pipeline

痞子三分冷 提交于 2019-12-07 11:39:17
Azure Data factory transferring data in Db in 10 millisecond but the issue I am having is it is waiting for few mins to trigger next pipeline and that ends up with 40 mins all pipelines are taking less than 20 ms to transfer data. But somehow it is waiting a few mins to trigger the next one. I used debug mode as well trigger the ADF using Logic App without debugging mood. Is there any way I can optimize it we want to move from SSIS to Data Flow but having a time issue 40 mins are so much in next step we have millions of records so it took 7 seconds to transfer data to dataBase but it waited