azure-data-lake

How to choose between Azure data lake analytics and Azure Databricks

耗尽温柔 提交于 2019-12-30 01:44:05
问题 Azure data lake analytics and azure databricks both can be used for batch processing. Could anyone please help me understand when to choose one over another? 回答1: In my humble opinion, a lot of it comes down to existing skillsets. If you have a team experienced in Spark, Java, Python, r or Scala then Databricks is a natural fit. If on the other hand you have a team with existing SQL and c# skills, then the learning curve for them with U-SQL will be less steep. That aside, there are other

Azure Data Lake Analytics IOutputter E_RUNTIME_USER_ROWTOOBIG

浪尽此生 提交于 2019-12-25 09:24:53
问题 I'm trying to write the results of my custom IOutputter to an intermediate file on the local disk. After that I want to copy the database file (~20MB) to the adl output store. Sadly the script terminates with: An unhandled exception of type 'Microsoft.Cosmos.ScopeStudio.BusinessObjects.Debugger.ScopeDebugException' occurred in Microsoft.Cosmos.ScopeStudio.BusinessObjects.Debugger.dll Additional information: {"diagnosticCode":195887112,"severity":"Error","component":"RUNTIME","source":"User",

Azure CDN - Origin hostname not shown when Origin Type selected is Storage

佐手、 提交于 2019-12-25 08:34:24
问题 Azure Data lake Storage hostname is not shown in the dropdown menu when I am trying to configure Azure CDN with Origin Type as Storage 回答1: The origin Type of Azure CDN profile is:Storage, Web App, Cloud Service &Custom Origin. There is no Azure Data Lake at currently. And Azure Data Lake Store is still in preview. If you have any feature request, please submit in Azure feedback forum. 来源: https://stackoverflow.com/questions/39761466/azure-cdn-origin-hostname-not-shown-when-origin-type

Writing output of String manipulation to Azure Data lake Store Item

落花浮王杯 提交于 2019-12-25 01:36:09
问题 When I try to write an output of a String manipulation of Get-AzureRmDataLakeStoreItemContent output to a variable and try to pass it in a variable to New-AzureRmDataLakeStoreItem i am getting error "New-AzureRmDataLakeStoreItem : Invalid content passed in. Only byte[] and string content is supported." I verified that the output of Get-command is an Object but I dont understand why i am not able to pass it. I am not sure whether I need some more transformation like a Hashtable to store into

U-SQL get file paths from pattern

孤者浪人 提交于 2019-12-25 00:45:32
问题 I need to get a list of files to then filter this set DECLARE @input_file string = @"\data\{*}\{*}\{*}.avro"; @filenames = SELECT filename FROM @input_file; @filtered = SELECT filename FROM @filenames WHERE {condition} Something like this if it's possible... 回答1: The way to do that is define virtual columns in your fileset. You can then extract and manipulate these virtual columns like they were data fields extracted from your file. Example: DECLARE @input_file string = "/data/{_partition1}/{

How to copy files and folder from one ADLS to another one on different subscription?

只愿长相守 提交于 2019-12-24 19:29:12
问题 I need to be able to copy files and folder from one DataLake to another DataLake on a different subscription, I'm in possession of both Auth Token and secret key. I've tried different solution including: https://medium.com/azure-data-lake/connecting-your-own-hadoop-or-spark-to-azure-data-lake-store-93d426d6a5f4 wich is involving hadoop but didn't worked on two different subscriptions, due to the site-core.xml wich only accept one subscripton. ADLcopy didn't worked as well, neither DataFactory

Azure Data Lake Store File read using SSIS Script Component

纵饮孤独 提交于 2019-12-24 11:13:02
问题 Appreciate your suggestions. My Requirement is, Read json file from ADLS using SSIS and load into SQL table Implementation: I have implemented the code to read json file content in .Net Console app. This is working fine in Console app. I copied the same code in SSIS Script component, but it throws "The type initializer for 'Microsoft.Azure.DataLake.Store.AdlsClient' threw an exception" exception in AdlsClient.CreateClient. using Microsoft.Rest; using Microsoft.Rest.Azure.Authentication; using

Write the results of the Google Api to a data lake with Databricks

a 夏天 提交于 2019-12-24 10:55:14
问题 I am getting back user usage data from the Google Admin Report User Usage Api via the Python SDK on Databricks. The data size is around 100 000 records per day which I do a night via a batch process. The api returns a max page size of 1000 so I call it 1000 roughly to get the data I need for the day. This is working fine. My ultimate aim is to store the data in its raw format in a data lake (Azure Gen2, but irrelevant to this question). Later on, I will transform the data using Databricks

U-SQL Error in Naming the Column

≯℡__Kan透↙ 提交于 2019-12-24 10:47:53
问题 I have a JSON where the order of fields is not fixed. i.e. I can have [A, B, C] or [B, C, A] All A, B, C are json objects are of the form {Name: x, Value:y}. So, when I use USQL to extract the JSON (I don't know their order) and put it into a CSV (for which I will need column name): @output = SELECT A["Value"] ?? "0" AS CAST ### (("System_" + A["Name"]) AS STRING), B["Value"] ?? "0" AS "System_" + B["Name"], System_da So, I am trying to put column name as the "Name" field in the JSON. But am

how we get Data Factory logging information

你说的曾经没有我的故事 提交于 2019-12-24 08:29:01
问题 how we get Data Factory logging information. do Microsoft done any documentation. i need complete information when i run pipeline i.e Start time , end time, pipeline job id, no of record inserted, deleted, update, error, etc 回答1: ADF doesn't currently write to the Azure Activity Logs, meaning you can't access details using the Azure Monitor. Currently the best way I find to get this information is using PowerShell. For example: Get-AzureRmDataFactoryActivityWindow ` -DataFactoryName $ADFName