azure-data-factory-2

Calling a PowerShell script from Azure batch custom activity using PowerShell and application environment variable

怎甘沉沦 提交于 2021-02-20 03:01:12
问题 I've been slowly working out how to call a PowerShell script to transform IIS logs using LogParser 2.2. I've settled on using Azure Data Factory Batch Service Custom Activity to run the PowerShell script. I've been able to figure out how to address many of the file path issues that arise in running PowerShell from within Azure Custom Batch Activity, but I can't figure this one out. Currently I'm just trying to print via Write-Host the environment variable AZ_BATCH_APP_PACKAGE

Copy and Extracting Zipped XML files from HTTP Link Source to Azure Blob Storage using Azure Data Factory

徘徊边缘 提交于 2021-02-19 08:48:05
问题 I am trying to establish an Azure Data Factory copy data pipeline. The source is an open HTTP Linked Source (Url reference: https://clinicaltrials.gov/AllPublicXML.zip). So basically the source contains a zipped folder having many XML files. I want to unzip and save the extracted XML files in Azure Blob Storage using Azure Data Factory. I was trying to follow the configurations mentioned here: How to decompress a zip file in Azure Data Factory v2 but I am getting the following error:

DIU does not increase not more than 4 on copy activity

拥有回忆 提交于 2021-02-11 16:57:27
问题 I am trying to copy data from GCP(Big Query) Azure Storage Gen2 parquet file with below configuration. Increased DIU from 4 to 16 but during runtime the DIU does not go beyond 4. Can you please help on how to increase the DIU to make my process faster? using preserve hierarchy Data size 12 millions with 3gb Throughput is 2.5mbps 回答1: To increase DIU for a copy activity just click on the activity, and under the Settings tab you can find the Data Integration Unit selector. 来源: https:/

DIU does not increase not more than 4 on copy activity

*爱你&永不变心* 提交于 2021-02-11 16:55:23
问题 I am trying to copy data from GCP(Big Query) Azure Storage Gen2 parquet file with below configuration. Increased DIU from 4 to 16 but during runtime the DIU does not go beyond 4. Can you please help on how to increase the DIU to make my process faster? using preserve hierarchy Data size 12 millions with 3gb Throughput is 2.5mbps 回答1: To increase DIU for a copy activity just click on the activity, and under the Settings tab you can find the Data Integration Unit selector. 来源: https:/

You are not allowed to make changes or publish from 'Data Factory' mode as your factory has GIT enabled

*爱你&永不变心* 提交于 2021-02-11 16:50:44
问题 I am facing a issue. I have a data factory with 20 piplines and dataset and linked services I enabled git with name xyz project and created adf-publish branch in that ,I worked almost 1 week in adf-publish branch. After one week my client is saying we have created new azure devops project with xyz1 prroject name.Now my changes is in adf-publish branch which comes under xyz project. My question is that how can I save changes again ADF since I tried but I got a error You are not allowed to make

You are not allowed to make changes or publish from 'Data Factory' mode as your factory has GIT enabled

只谈情不闲聊 提交于 2021-02-11 16:48:54
问题 I am facing a issue. I have a data factory with 20 piplines and dataset and linked services I enabled git with name xyz project and created adf-publish branch in that ,I worked almost 1 week in adf-publish branch. After one week my client is saying we have created new azure devops project with xyz1 prroject name.Now my changes is in adf-publish branch which comes under xyz project. My question is that how can I save changes again ADF since I tried but I got a error You are not allowed to make

How to filter timestamp column in Data Flow of Azure Data Factory

China☆狼群 提交于 2021-02-11 14:51:23
问题 I have timestamp column where I have written following expression to filter the column: contact_date >= toTimestamp('2020-01-01') && contact_date <= toTimestamp('2020-12-31') It doesn't complain about syntax but after run it doesn't filter based on date specified. Simply to say logic doesn't work. Any idea? Date Column in Dataset: 回答1: Please don't use toTimestamp() function. I tested and you will get null output. I use a Filter active to filter the data. Please use the toString() and change

How to filter timestamp column in Data Flow of Azure Data Factory

不问归期 提交于 2021-02-11 14:50:01
问题 I have timestamp column where I have written following expression to filter the column: contact_date >= toTimestamp('2020-01-01') && contact_date <= toTimestamp('2020-12-31') It doesn't complain about syntax but after run it doesn't filter based on date specified. Simply to say logic doesn't work. Any idea? Date Column in Dataset: 回答1: Please don't use toTimestamp() function. I tested and you will get null output. I use a Filter active to filter the data. Please use the toString() and change

Split a json string column or flatten transformation in data flow (ADF)

那年仲夏 提交于 2021-02-11 14:36:31
问题 I copy the following csv file to a data flow in ADF. The column Data has json format, but it is considered string. I want to flatten Data column into individual rows. I tried the flatten transformation, it did not work as Data column is not json. How do I deal with it? I also tried split expression, and it did not work either. Thank you 回答1: Just from your screenshot, We can find that : The data in Data are not JSON format. Data most look like an Array. The 'array' has 9 elements. Me must

Azure Data Factory : Set a limit to copy number of files using Copy activity

落花浮王杯 提交于 2021-02-11 14:01:10
问题 I have a copy activity used in my pipeline to copy files from Azure data Lake gen 2. The source location may have 1000's of files and the files are required to be copied but we need to set a limit for the number files required to be copied. Is there any option available in ADF to achieve the same barring a custom activity? Eg: I have 2000 files available in Data lake, but while running the pipeline i should able to pass a parameter to copy only 500 files. Regards, Sandeep 回答1: I think you can