azure-data-lake

Azure Data Lake Store - existing connection was forcibly closed by the remote host

前提是你 提交于 2019-12-11 06:59:39
问题 I use DataLakeStoreFileSystemManagementClient class for reading files from Data Lake Store. We open a steam for the file with the code like that, read it byte by byte and process it. it is a specific case where we can not use U-SQL for data processing. m_adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(…); return m_adlsFileSystemClient.FileSystem.OpenAsync(m_connection.AccountName, path); The process may take up to 60 minutes for reading and processing the file. The problem

Concurrent read/write to ADLA

戏子无情 提交于 2019-12-11 05:44:43
问题 Q:1 We are thinking of parallelizing read/write to ADLA tables and was wondering what are implications of such design. I think reads are fine but what should be the best practice to have concurrent writes to same ADLA table. Q:2 Suppose we have USQL scripts which has multiple rowsets and multiple output/insert in same/different ADLA tables. What is transaction scope story in USQL. If any of output/insert statement fails then will it cause all previous inserts to rollback or not. How to handle

How to improve the performance when copying data from cosmosdb?

北慕城南 提交于 2019-12-11 05:28:31
问题 I am now trying to copy data from cosmosdb to data lake store by data factory. However, the performance is poor, about 100KB/s, and the data volume is 100+ GB, and keeps increasing. It will take 10+ days to finish, which is not acceptable. Microsoft document https://docs.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-performance mentioned that the max speed from cosmos to data lake store is 1MB/s. Even this, the performance is still bad for us. The cosmos migration tool

Azure Data Lake Storage and Data Factory - Temporary GUID folders and files

China☆狼群 提交于 2019-12-11 04:25:33
问题 I am using Azure Data Lake Store (ADLS), targeted by an Azure Data Factory (ADF) pipeline that reads from Blob Storage and writes in to ADLS. During execution I notice that there is a folder created in the output ADLS that does not exist in the source data. The folder has a GUID for a name and many files in it, also GUIDs. The folder is temporary and after around 30 seconds it disappears. Is this part of the ADLS metadata indexing? Is it something used by ADF during processing? Although it

Service to Service Authentication in Azure Data Lake Store using .NET SDK

谁说我不能喝 提交于 2019-12-11 04:23:24
问题 I am trying to access this Service to Service Service-to-service authentication with client secret and certificate (trying with both) but no one working for me. I am trying to create directory on Data lake store but that given code is not working. It's not even giving any error so i can identify it. I have followed all the steps mentioned in this page but not able to create directory on data lake store. I have also tried File upload code but that is also not working. End user Authentication

How to parse big string U-SQL Regex

亡梦爱人 提交于 2019-12-11 03:36:30
问题 I have got a big CSVs that contain big strings. I wanna parse them in U-SQL. @t1 = SELECT Regex.Match("ID=881cf2f5f474579a:T=1489536183:S=ALNI_MZsMMpA4voGE4kQMYxooceW2AOr0Q", "ID=(?<ID>\\w+):T=(?<T>\\w+):S=(?<S>[\\w\\d_]*)") AS p FROM (VALUES(1)) AS fe(n); @t2 = SELECT p.Groups["ID"].Value AS gads_id, p.Groups["T"].Value AS gads_t, p.Groups["S"].Value AS gads_s FROM @t1; OUTPUT @t TO "/inhabit/test.csv" USING Outputters.Csv(); Severity Code Description Project File Line Suppression State

How to read encrypted and gzipped blob data from u-sql

三世轮回 提交于 2019-12-11 01:47:59
问题 I would like to read file from a blob that is first compressed (gz) and then encrypted. The encryption done using Azure SDK when file uploaded to Blob (BlobEncryptionPolicy passed to CloudBlockBlob.UploadFromStreamAsync method). There blob file have .gz extension so U-SQL trying to decompress but fails as the file is encrypted. Is it possible to set my u-sql script to handle the decompression automatically same as done by Azure SDK (for instance by CloudBlockBlob.BeginDownloadToStream)? If

Writing custom extractor in USQL to skip rows with encoding problems

可紊 提交于 2019-12-11 01:23:33
问题 I have a large set of data that spans a couple hundred files. Apparently, it's got a few encoding issues in it (it's mostly UTF-8, but apparently some characters just aren't valid). According to https://msdn.microsoft.com/en-us/library/azure/mt764098.aspx if there is an encoding error, a runtime error will occur regardless of setting the silent flag to true (with the aim of just skipping erroring rows). As a result, I need to write a custom extractor. I've written one that largely does a

U-SQL Job Failing in Data Factory

僤鯓⒐⒋嵵緔 提交于 2019-12-11 01:18:20
问题 I keep getting following error from Data Factory whenever I run an U-SQL Job Job submission failed, the user 'adla account name' does not have permissions to a subfolder in the /system/ path needed by Data Lake Analytics. Please run “Add User Wizard” from the Data Lake Analytics Azure Portal or use Azure PowerShell to grant access for the user to the /system/ and its children on the Data Lake Store. And I am not using any Firewall as suggested in this post: Run U-SQL Script from C# code with

Optimizer internal error while loading data from U-SQL table

天大地大妈咪最大 提交于 2019-12-10 23:15:13
问题 Is there a way to get around this error. "CQO: Internal Error - Optimizer internal error. Assert: a_drgcidChild->CLength() == UlSafeCLength(popMS->Pdrgcid()) in rlstreamset.cpp:499" Facing this issue while loading data from partitioned U-SQL table. @myData = SELECT * FROM dbo.MyTable; 回答1: If you encounter any system error message (or something that says Internal Error), please open a support ticket with us and/or send me your job link (if it happens on the cluster) or a self-contained