u-sql

Execute U-SQL script in ADL storage from Data Factory in Azure

℡╲_俬逩灬. 提交于 2019-12-11 08:31:33
问题 I have a USQL script stored on my ADL store and I am trying to execute it. the script file is quite big - about 250Mb. So far i have a Data Factory, I have created a Linked Service and am trying to create a Data lake Analytics U-SQL Activity. The code for my U-SQL Activity looks like this: { "name": "RunUSQLScript1", "properties": { "description": "Runs the USQL Script", "activities": [ { "name": "DataLakeAnalyticsUSqlActivityTemplate", "type": "DataLakeAnalyticsU-SQL", "linkedServiceName":

How to unit test U-SQL scripts?

烈酒焚心 提交于 2019-12-11 07:35:23
问题 I currently have a U-SQL project with a set of different scripts, and i am trying to create unit tests for them. I can run the scripts locally using the Azure Data Lake tools with a set of test data and generate the expected outputs. The scripts are pure U-SQL data manipulation/transformation so because there are no methods i am not sure whats the correct approach to test this? If anyone has any experience/idea on how it should be done or any documentation please feel free to help. Thank you

Concurrent read/write to ADLA

戏子无情 提交于 2019-12-11 05:44:43
问题 Q:1 We are thinking of parallelizing read/write to ADLA tables and was wondering what are implications of such design. I think reads are fine but what should be the best practice to have concurrent writes to same ADLA table. Q:2 Suppose we have USQL scripts which has multiple rowsets and multiple output/insert in same/different ADLA tables. What is transaction scope story in USQL. If any of output/insert statement fails then will it cause all previous inserts to rollback or not. How to handle

How to improve the performance when copying data from cosmosdb?

北慕城南 提交于 2019-12-11 05:28:31
问题 I am now trying to copy data from cosmosdb to data lake store by data factory. However, the performance is poor, about 100KB/s, and the data volume is 100+ GB, and keeps increasing. It will take 10+ days to finish, which is not acceptable. Microsoft document https://docs.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-performance mentioned that the max speed from cosmos to data lake store is 1MB/s. Even this, the performance is still bad for us. The cosmos migration tool

U-Sql use rowset variable for decision making

一世执手 提交于 2019-12-11 04:27:55
问题 I want to use rowset variable as scaler variable. @cnt = Select count(*) from @tab1; If (@cnt > 0) then @cnt1= select * from @tab2; End; Is it possible? ====================================== I want to block the complex u-sql code based on some condition, lets say based on some control table. In my original code, I wrote 10-15 u-sql statements and I want to bound them within the If statement. I don't want to do cross join because it again start trying to join the table. If I use cross join,

u-sql script to search for a string then Groupby that string and get the count of distinct files

牧云@^-^@ 提交于 2019-12-11 04:16:13
问题 I am quite new to u-sql, trying to solve str1=\global\europe\Moscow\12345\File1.txt str2=\global.bee.com\europe\Moscow\12345\File1.txt str3=\global\europe\amsterdam\54321\File1.Rvt str4=\global.bee.com\europe\amsterdam\12345\File1.Rvt case1: how do i get just "\europe\Moscow\12345\File1.txt" from the strings variable str1 & str2, i want to just take ("\europe\Moscow\12345\File1.txt") from str1 and str2 then "Groupby(\global\europe\Moscow\12345)" and take the count of distinct files from the

How to parse big string U-SQL Regex

亡梦爱人 提交于 2019-12-11 03:36:30
问题 I have got a big CSVs that contain big strings. I wanna parse them in U-SQL. @t1 = SELECT Regex.Match("ID=881cf2f5f474579a:T=1489536183:S=ALNI_MZsMMpA4voGE4kQMYxooceW2AOr0Q", "ID=(?<ID>\\w+):T=(?<T>\\w+):S=(?<S>[\\w\\d_]*)") AS p FROM (VALUES(1)) AS fe(n); @t2 = SELECT p.Groups["ID"].Value AS gads_id, p.Groups["T"].Value AS gads_t, p.Groups["S"].Value AS gads_s FROM @t1; OUTPUT @t TO "/inhabit/test.csv" USING Outputters.Csv(); Severity Code Description Project File Line Suppression State

auto_increment in U-SQL

孤街醉人 提交于 2019-12-11 02:12:42
问题 I am trying to form a new table that contains unique user_id's from existing one. Is it possible to add auto_increment primary key in U-SQL like we can add in MySQL? 回答1: To elaborate on David's answer: Unlike MySQL, ADLA/U-SQL is executed in a scale-out shared nothing architecture. Thus there is not an easy way to manage auto-incremental numbers. However, there is are some tricks that you can use: You can use the ROW_NUMBER() function to generate a number per row. You could add that to the

How to read encrypted and gzipped blob data from u-sql

三世轮回 提交于 2019-12-11 01:47:59
问题 I would like to read file from a blob that is first compressed (gz) and then encrypted. The encryption done using Azure SDK when file uploaded to Blob (BlobEncryptionPolicy passed to CloudBlockBlob.UploadFromStreamAsync method). There blob file have .gz extension so U-SQL trying to decompress but fails as the file is encrypted. Is it possible to set my u-sql script to handle the decompression automatically same as done by Azure SDK (for instance by CloudBlockBlob.BeginDownloadToStream)? If

Writing custom extractor in USQL to skip rows with encoding problems

可紊 提交于 2019-12-11 01:23:33
问题 I have a large set of data that spans a couple hundred files. Apparently, it's got a few encoding issues in it (it's mostly UTF-8, but apparently some characters just aren't valid). According to https://msdn.microsoft.com/en-us/library/azure/mt764098.aspx if there is an encoding error, a runtime error will occur regardless of setting the silent flag to true (with the aim of just skipping erroring rows). As a result, I need to write a custom extractor. I've written one that largely does a