azure-data-lake | 易学教程

USQL Query to create a Table from Json Data

阅读更多关于 USQL Query to create a Table from Json Data

问题 I have a json which is like [{}, {}, {}] , i.e. there can be multiple rows and each row has a number of property - value pairs, which remain fixed for each row. @json = EXTRACT MainId string, Details string FROM @INPUT_FILE USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor(); This gives me json as a string. I don't know how to get: row[3].property4 things like a property's value of a given row. Complicating things the properties are all themselves arranged as {Name: "XXX",

What is the maximum allowed size for String in U-SQL?

阅读更多关于 What is the maximum allowed size for String in U-SQL?

问题 while processing a CSV file, I am getting an error about maximum string size. "String size exceeds the maximum allowed size". 回答1: Currently the maximum allowed size for a string in U-SQL is 128 KB . If you need to handle larger sizes than that for now, then use the byte[] type instead when reading from the CSV file. Later, as the rowsets are processed in the script in the body of some C# code you can transform the byte[] into a string and do whatever string operations you need in the C# code

azure.datalake.store.AdlFileSystem not found in Spark

阅读更多关于 azure.datalake.store.AdlFileSystem not found in Spark

问题 I am trying to use spark sql to query a csv file placed in Data Lake Store. when I query i am getting "java.lang.ClassNotFoundException: Class com.microsoft.azure.datalake.store.AdlFileSystem not found". How can I use spark sql to query a file placed in Data Lake Store? Please help me with a sample. Example csv: Id Name Designation 1 aaa bbb 2 ccc ddd 3 eee fff Thanks in advance, Sowandharya 回答1: Presently HDInsight-Spark Clusters are not available with Azure Data Lake Storage. Once we have

When do binaryFiles load into memory when mapPartitions is used?

阅读更多关于 When do binaryFiles load into memory when mapPartitions is used?

问题 I am using PySpark to apply a trained deep learning model to images and am concerned with how memory usage will scale with my current approach. Because the trained model takes a while to load, I process large batches of images on each worker with code similar to the following: def run_eval(file_generator): trained_model = load_model() results = [] for file in file_generator: # "file" is a tuple: [0] is its filename, [1] is the byte data results.append(trained_model.eval(file[1])) return

Query JSON nested objects using U-SQL

阅读更多关于 Query JSON nested objects using U-SQL

问题 I am trying to get Country and Category from below. I am able to get country but not Category. Example input: [{ "context": { "location": { "clientip": "0.0.0.0", "continent": "Asia", "country": "Singapore" }, "custom": { "dimensions": [{ "Category": "Noah Version" }] } } }] My Query: @json = EXTRACT [location] string, [device] string, [custom.dimensions] string FROM @InputFile USING new JsonExtractor("context"); @CreateJSONTuple = SELECT JsonFunctions.JsonTuple([location]) AS LocationData,

how to combine different schemas

阅读更多关于 how to combine different schemas

问题 I'm using a custom OUTPUTTER to generate XML from my "flat data" like so: SELECT *.. OUTPUT @all_data TO "/patient/{ID}.tsv" USING new Microsoft.Analytics.Samples.Formats.Xml.XmlOutputter("Patient"); Which generates individual files that look like this: <Patient> <ID>5283293478</ID> <ANESTHESIA_START>09/06/2019 11:52:00</ANESTHESIA_START> <ANESHTHESIA_END>09/06/2019 14:40:00</ANESHTHESIA_END> <SURGERY_START_TIME>9/6/2019 11:52:00 AM</SURGERY_START_TIME> <SURGERY_END_TIME>9/6/2019 2:34:00 PM<

Dynamic FROM in U-SQL statement

阅读更多关于 Dynamic FROM in U-SQL statement

问题 I am trying to generate a dynamic FROM clause in U-SQL so that we can extract data from different files based on a previous query outcome. That's something like this: @filesToExtract = SELECT whatevergeneratesthepaths from @foo; <-- this query generates a rowset with all the file we want to extract like: [/path/file1.csv, /path/file2.csv] SELECT * FROM @filesToExtract; <-- here we want to extract the data from file1 and file2 I'm afraid that this kind of dynamics queries are not supported yet

Create File From Azure Data Lake Store .NET SDK

阅读更多关于 Create File From Azure Data Lake Store .NET SDK

问题 I can't find any refrence in documentation about that. My question is simple, how to create file in data lake store from .net sdk ( for example create test.csv in path /Test/test.csv ). Is there any way to do this, or moreover create file from byte or string content ( some other upload parameter class which first argument is not path to source file, but content which i want send to data lake store ). 回答1: Here is the reference article that explains how to create a file:

How to run PowerShell from Azure Data Factory

阅读更多关于 How to run PowerShell from Azure Data Factory

问题 I have PowerShell script which splits a complex CSV file to a smaller CSV file for every 1000 records. Here is the code: $i=0;Get-Content C:\Users\dell\Desktop\Powershell\Input\bigsizeFile.csv -ReadCount 1000 | %{$i++; $_ | Out-File C:\Users\dell\Desktop\Powershell\Output\file$i.csv } Now I want to use this script in Azure PowerShell and I want to run this from Azure Data Factory. Can Someone please help with this. 回答1: you could execute your powershell command by using Custom activity in

Can't we upload documents/Image using USQL Custom Code and usql?

阅读更多关于 Can't we upload documents/Image using USQL Custom Code and usql?

问题 Situation : We have created database say "CLSTrackOMeter" and table say "Customer_Information" in Azure data lake Analytics. Customer_Information, stores the path of image in staging folder( For now i've hard code the source image path in class library). Agenda : use that value from CustInfo to upload data to Azure data lake store "Customer_Image" folder Tried Solution - Created usql class library, using .net sdk to upload files(Able to execute this class library in console application), and