u-sql

Partition By & Clustered & Distributed By in USql - Need to know their meaning and when to use them

白昼怎懂夜的黑 提交于 2019-12-24 03:16:10
问题 I can see that while creating table in USQL we can use Partition By & Clustered & Distributed By clauses. As per my understanding partition will store data of same key (on which we have partition) together or closer (may be in same structured stream at background), so that our query will be more faster when we use that key in joins, filter. Clustering is - I guess it stores data of those columns together or closer inside each partition. And Distribution is some method like Hash or Round Robin

Unit testing for usql applier and scripts

我的梦境 提交于 2019-12-23 15:02:09
问题 I have a custom USql applier which extends the IApplier class. [SqlUserDefinedApplier] public class CsvApplier : IApplier { public CsvApplier() { //totalcount = count; } public override IEnumerable<IRow> Apply(IRow input, IUpdatableRow output) { //....custom logic //yield return or yield break } } This applier is then used from Usql script as @log = SELECT t.ultimateID, t.siteID, . . . t.eTime, t.hours FROM @logWithCount CROSS APPLY new BSWBigData.USQLApplier.CsvApplier() AS t(ultimateID

Can I use Regular Expressions in USQL?

♀尐吖头ヾ 提交于 2019-12-23 05:13:11
问题 Is it possible to write regular expression comparisons in USQL? For example, rather than multiple "LIKE" statements to search for the name of various food items, I want to perform a comparison of multiple items using a single Regex expression. 回答1: You can create a new Regex object inline and then use the IsMatch() method. The example below returns "Y" if the Offer_Desc column contains the word "bacon", "croissant", or "panini". @output = SELECT , CSHARP(new Regex("\\b(BACON|CROISSANT|PANINI

how can we have dynamic output file name in u-sql in azure data lake based on timestamp job is excuted

放肆的年华 提交于 2019-12-22 11:18:00
问题 how can we have dynamic output file name in u-sql in azure data lake based on timestamp when job is executed.Thanks for help.My code as below: OUTPUT @telDataResult TO @"wasb://blobcontainer@blobstorage.blob.core.windows.net/**yyyymmdd**_TelDataOutput.Csv" USING Outputters.Csv(); 回答1: This feature is currently in development but not available yet. Feel free to add your vote to the feature request: https://feedback.azure.com/forums/327234-data-lake/suggestions/10550388-support-dynamic-output

Reasons to use Azure Data Lake Analytics vs Traditional ETL approach

♀尐吖头ヾ 提交于 2019-12-22 04:58:11
问题 I'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for so many years. I think of Data Lake as something very linked to big data, but where is the line between using Data Lake technolgies vs SSIS? Is there any advantage of using Data Lake technologies with 25MB ~100MB ~ 300MB files? Parallelism? flexibility? Extensible in the future? Is there any performance gain when the

USQL Query to create a Table from Json Data

ⅰ亾dé卋堺 提交于 2019-12-18 09:29:05
问题 I have a json which is like [{}, {}, {}] , i.e. there can be multiple rows and each row has a number of property - value pairs, which remain fixed for each row. @json = EXTRACT MainId string, Details string FROM @INPUT_FILE USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor(); This gives me json as a string. I don't know how to get: row[3].property4 things like a property's value of a given row. Complicating things the properties are all themselves arranged as {Name: "XXX",

Convert Rowset variables to scalar value

蓝咒 提交于 2019-12-18 06:57:03
问题 Is it possible to convert rowset variables to scalar value for eg. @maxKnownId = SELECT MAX(Id) AS maxID FROM @PrevDayLog; DECLARE @max int = @maxKnownId; 回答1: There is no implicit conversion of a single-cell rowset to a scalar value in U-SQL (yet). What are you interested in using the value for? Most of the time you can write your U-SQL expression in a way that you do not need the scalar variable. E.g., if you want to use the value in a condition in another query, you could just use the

What is the maximum allowed size for String in U-SQL?

依然范特西╮ 提交于 2019-12-18 06:26:49
问题 while processing a CSV file, I am getting an error about maximum string size. "String size exceeds the maximum allowed size". 回答1: Currently the maximum allowed size for a string in U-SQL is 128 KB . If you need to handle larger sizes than that for now, then use the byte[] type instead when reading from the CSV file. Later, as the rowsets are processed in the script in the body of some C# code you can transform the byte[] into a string and do whatever string operations you need in the C# code

Error while running U-SQL Activity in Pipeline in Azure Data Factory

*爱你&永不变心* 提交于 2019-12-18 05:12:30
问题 I am getting following error while running a USQL Activity in the pipeline in ADF: Error in Activity: {"errorId":"E_CSC_USER_SYNTAXERROR","severity":"Error","component":"CSC", "source":"USER","message":"syntax error. Final statement did not end with a semicolon","details":"at token 'txt', line 3\r\nnear the ###:\r\n**************\r\nDECLARE @in string = \"/demo/SearchLog.txt\";\nDECLARE @out string = \"/scripts/Result.txt\";\nSearchLogProcessing.txt ### \n", "description":"Invalid syntax

Query JSON nested objects using U-SQL

时光怂恿深爱的人放手 提交于 2019-12-13 09:38:18
问题 I am trying to get Country and Category from below. I am able to get country but not Category. Example input: [{ "context": { "location": { "clientip": "0.0.0.0", "continent": "Asia", "country": "Singapore" }, "custom": { "dimensions": [{ "Category": "Noah Version" }] } } }] My Query: @json = EXTRACT [location] string, [device] string, [custom.dimensions] string FROM @InputFile USING new JsonExtractor("context"); @CreateJSONTuple = SELECT JsonFunctions.JsonTuple([location]) AS LocationData,