u-sql | 易学教程

Partition By & Clustered & Distributed By in USql - Need to know their meaning and when to use them

阅读更多关于 Partition By & Clustered & Distributed By in USql - Need to know their meaning and when to use them

问题 I can see that while creating table in USQL we can use Partition By & Clustered & Distributed By clauses. As per my understanding partition will store data of same key (on which we have partition) together or closer (may be in same structured stream at background), so that our query will be more faster when we use that key in joins, filter. Clustering is - I guess it stores data of those columns together or closer inside each partition. And Distribution is some method like Hash or Round Robin

Unit testing for usql applier and scripts

阅读更多关于 Unit testing for usql applier and scripts

问题 I have a custom USql applier which extends the IApplier class. [SqlUserDefinedApplier] public class CsvApplier : IApplier { public CsvApplier() { //totalcount = count; } public override IEnumerable<IRow> Apply(IRow input, IUpdatableRow output) { //....custom logic //yield return or yield break } } This applier is then used from Usql script as @log = SELECT t.ultimateID, t.siteID, . . . t.eTime, t.hours FROM @logWithCount CROSS APPLY new BSWBigData.USQLApplier.CsvApplier() AS t(ultimateID

Can I use Regular Expressions in USQL?

阅读更多关于 Can I use Regular Expressions in USQL?

问题 Is it possible to write regular expression comparisons in USQL? For example, rather than multiple "LIKE" statements to search for the name of various food items, I want to perform a comparison of multiple items using a single Regex expression. 回答1: You can create a new Regex object inline and then use the IsMatch() method. The example below returns "Y" if the Offer_Desc column contains the word "bacon", "croissant", or "panini". @output = SELECT , CSHARP(new Regex("\\b(BACON|CROISSANT|PANINI

how can we have dynamic output file name in u-sql in azure data lake based on timestamp job is excuted

阅读更多关于 how can we have dynamic output file name in u-sql in azure data lake based on timestamp job is excuted

问题 how can we have dynamic output file name in u-sql in azure data lake based on timestamp when job is executed.Thanks for help.My code as below: OUTPUT @telDataResult TO @"wasb://blobcontainer@blobstorage.blob.core.windows.net/**yyyymmdd**_TelDataOutput.Csv" USING Outputters.Csv(); 回答1: This feature is currently in development but not available yet. Feel free to add your vote to the feature request: https://feedback.azure.com/forums/327234-data-lake/suggestions/10550388-support-dynamic-output

Reasons to use Azure Data Lake Analytics vs Traditional ETL approach

阅读更多关于 Reasons to use Azure Data Lake Analytics vs Traditional ETL approach

问题 I'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for so many years. I think of Data Lake as something very linked to big data, but where is the line between using Data Lake technolgies vs SSIS? Is there any advantage of using Data Lake technologies with 25MB ~100MB ~ 300MB files? Parallelism? flexibility? Extensible in the future? Is there any performance gain when the

USQL Query to create a Table from Json Data

阅读更多关于 USQL Query to create a Table from Json Data

问题 I have a json which is like [{}, {}, {}] , i.e. there can be multiple rows and each row has a number of property - value pairs, which remain fixed for each row. @json = EXTRACT MainId string, Details string FROM @INPUT_FILE USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor(); This gives me json as a string. I don't know how to get: row[3].property4 things like a property's value of a given row. Complicating things the properties are all themselves arranged as {Name: "XXX",

Convert Rowset variables to scalar value

阅读更多关于 Convert Rowset variables to scalar value

问题 Is it possible to convert rowset variables to scalar value for eg. @maxKnownId = SELECT MAX(Id) AS maxID FROM @PrevDayLog; DECLARE @max int = @maxKnownId; 回答1: There is no implicit conversion of a single-cell rowset to a scalar value in U-SQL (yet). What are you interested in using the value for? Most of the time you can write your U-SQL expression in a way that you do not need the scalar variable. E.g., if you want to use the value in a condition in another query, you could just use the

What is the maximum allowed size for String in U-SQL?

阅读更多关于 What is the maximum allowed size for String in U-SQL?

问题 while processing a CSV file, I am getting an error about maximum string size. "String size exceeds the maximum allowed size". 回答1: Currently the maximum allowed size for a string in U-SQL is 128 KB . If you need to handle larger sizes than that for now, then use the byte[] type instead when reading from the CSV file. Later, as the rowsets are processed in the script in the body of some C# code you can transform the byte[] into a string and do whatever string operations you need in the C# code

Error while running U-SQL Activity in Pipeline in Azure Data Factory

阅读更多关于 Error while running U-SQL Activity in Pipeline in Azure Data Factory

问题 I am getting following error while running a USQL Activity in the pipeline in ADF: Error in Activity: {"errorId":"E_CSC_USER_SYNTAXERROR","severity":"Error","component":"CSC", "source":"USER","message":"syntax error. Final statement did not end with a semicolon","details":"at token 'txt', line 3\r\nnear the ###:\r\n**************\r\nDECLARE @in string = \"/demo/SearchLog.txt\";\nDECLARE @out string = \"/scripts/Result.txt\";\nSearchLogProcessing.txt ### \n", "description":"Invalid syntax

Query JSON nested objects using U-SQL

阅读更多关于 Query JSON nested objects using U-SQL

问题 I am trying to get Country and Category from below. I am able to get country but not Category. Example input: [{ "context": { "location": { "clientip": "0.0.0.0", "continent": "Asia", "country": "Singapore" }, "custom": { "dimensions": [{ "Category": "Noah Version" }] } } }] My Query: @json = EXTRACT [location] string, [device] string, [custom.dimensions] string FROM @InputFile USING new JsonExtractor("context"); @CreateJSONTuple = SELECT JsonFunctions.JsonTuple([location]) AS LocationData,