azure-data-lake

USQL Query to create a Table from Json Data

ⅰ亾dé卋堺 提交于 2019-12-18 09:29:05
问题 I have a json which is like [{}, {}, {}] , i.e. there can be multiple rows and each row has a number of property - value pairs, which remain fixed for each row. @json = EXTRACT MainId string, Details string FROM @INPUT_FILE USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor(); This gives me json as a string. I don't know how to get: row[3].property4 things like a property's value of a given row. Complicating things the properties are all themselves arranged as {Name: "XXX",

What is the maximum allowed size for String in U-SQL?

依然范特西╮ 提交于 2019-12-18 06:26:49
问题 while processing a CSV file, I am getting an error about maximum string size. "String size exceeds the maximum allowed size". 回答1: Currently the maximum allowed size for a string in U-SQL is 128 KB . If you need to handle larger sizes than that for now, then use the byte[] type instead when reading from the CSV file. Later, as the rowsets are processed in the script in the body of some C# code you can transform the byte[] into a string and do whatever string operations you need in the C# code

azure.datalake.store.AdlFileSystem not found in Spark

 ̄綄美尐妖づ 提交于 2019-12-14 04:14:58
问题 I am trying to use spark sql to query a csv file placed in Data Lake Store. when I query i am getting "java.lang.ClassNotFoundException: Class com.microsoft.azure.datalake.store.AdlFileSystem not found". How can I use spark sql to query a file placed in Data Lake Store? Please help me with a sample. Example csv: Id Name Designation 1 aaa bbb 2 ccc ddd 3 eee fff Thanks in advance, Sowandharya 回答1: Presently HDInsight-Spark Clusters are not available with Azure Data Lake Storage. Once we have

When do binaryFiles load into memory when mapPartitions is used?

寵の児 提交于 2019-12-13 15:59:41
问题 I am using PySpark to apply a trained deep learning model to images and am concerned with how memory usage will scale with my current approach. Because the trained model takes a while to load, I process large batches of images on each worker with code similar to the following: def run_eval(file_generator): trained_model = load_model() results = [] for file in file_generator: # "file" is a tuple: [0] is its filename, [1] is the byte data results.append(trained_model.eval(file[1])) return

Query JSON nested objects using U-SQL

时光怂恿深爱的人放手 提交于 2019-12-13 09:38:18
问题 I am trying to get Country and Category from below. I am able to get country but not Category. Example input: [{ "context": { "location": { "clientip": "0.0.0.0", "continent": "Asia", "country": "Singapore" }, "custom": { "dimensions": [{ "Category": "Noah Version" }] } } }] My Query: @json = EXTRACT [location] string, [device] string, [custom.dimensions] string FROM @InputFile USING new JsonExtractor("context"); @CreateJSONTuple = SELECT JsonFunctions.JsonTuple([location]) AS LocationData,

how to combine different schemas

久未见 提交于 2019-12-13 08:04:43
问题 I'm using a custom OUTPUTTER to generate XML from my "flat data" like so: SELECT *.. OUTPUT @all_data TO "/patient/{ID}.tsv" USING new Microsoft.Analytics.Samples.Formats.Xml.XmlOutputter("Patient"); Which generates individual files that look like this: <Patient> <ID>5283293478</ID> <ANESTHESIA_START>09/06/2019 11:52:00</ANESTHESIA_START> <ANESHTHESIA_END>09/06/2019 14:40:00</ANESHTHESIA_END> <SURGERY_START_TIME>9/6/2019 11:52:00 AM</SURGERY_START_TIME> <SURGERY_END_TIME>9/6/2019 2:34:00 PM<

Dynamic FROM in U-SQL statement

柔情痞子 提交于 2019-12-13 00:28:29
问题 I am trying to generate a dynamic FROM clause in U-SQL so that we can extract data from different files based on a previous query outcome. That's something like this: @filesToExtract = SELECT whatevergeneratesthepaths from @foo; <-- this query generates a rowset with all the file we want to extract like: [/path/file1.csv, /path/file2.csv] SELECT * FROM @filesToExtract; <-- here we want to extract the data from file1 and file2 I'm afraid that this kind of dynamics queries are not supported yet

Create File From Azure Data Lake Store .NET SDK

折月煮酒 提交于 2019-12-12 20:50:24
问题 I can't find any refrence in documentation about that. My question is simple, how to create file in data lake store from .net sdk ( for example create test.csv in path /Test/test.csv ). Is there any way to do this, or moreover create file from byte or string content ( some other upload parameter class which first argument is not path to source file, but content which i want send to data lake store ). 回答1: Here is the reference article that explains how to create a file:

How to run PowerShell from Azure Data Factory

瘦欲@ 提交于 2019-12-12 18:47:59
问题 I have PowerShell script which splits a complex CSV file to a smaller CSV file for every 1000 records. Here is the code: $i=0;Get-Content C:\Users\dell\Desktop\Powershell\Input\bigsizeFile.csv -ReadCount 1000 | %{$i++; $_ | Out-File C:\Users\dell\Desktop\Powershell\Output\file$i.csv } Now I want to use this script in Azure PowerShell and I want to run this from Azure Data Factory. Can Someone please help with this. 回答1: you could execute your powershell command by using Custom activity in

Can't we upload documents/Image using USQL Custom Code and usql?

不问归期 提交于 2019-12-12 06:37:43
问题 Situation : We have created database say "CLSTrackOMeter" and table say "Customer_Information" in Azure data lake Analytics. Customer_Information, stores the path of image in staging folder( For now i've hard code the source image path in class library). Agenda : use that value from CustInfo to upload data to Azure data lake store "Customer_Image" folder Tried Solution - Created usql class library, using .net sdk to upload files(Able to execute this class library in console application), and