Data Lake Analytics U-SQL EXTRACT speed (Local vs Azure)

情到浓时终转凉″ 提交于 2020-01-02 07:51:10

问题


Been looking into using the Azure Data Lake Analytics functionality to try and manipulate some Gzip’d xml data I have stored within Azures Blob Storage but I’m running into an interesting issue. Essentially when using U-SQL locally to process 500 of these xml files the processing time is extremely quick , roughly 40 seconds using 1 AU locally (which appears to be the limit). However when we run this same functionality from within Azure using 5 AU’s the processing takes 17+ minutes.

We are eventually wanting to scale this up to ~ 20,000 files and more but have reduced the set to try and measure the speed.

Each file containing a collection of 50 xml objects (with varying amount of detail contained within child elements), the files are roughly 1 MB when Gzip’d and between 5MB and 10MB when not. 99% of the time processing time is spent within the EXTRACT section of the u-sql script.

Things tried,

Unzipped the files before processing, this took roughly the same time as the zipped version, certainly nowhere near the 40 seconds I was seeing locally. Moved the data from Blob storage to Azure Data Lake storage, took exactly the same length of time. Temporarily Removed about half of the data from the files and re-ran, surprisingly this didn’t take more than a minute off either. Added more AU’s to increase the processing time, this worked extremely well but isn’t a long term solution due to the costs that would be incurred. It seems to me as if there is a major bottleneck when getting the data from Azure Blob Storage/Azure Data Lake. Am I missing something obvious.

P.S. Let me know if you need any more information.

Thanks,

Nick.


回答1:


See slide 31 of https://www.slideshare.net/MichaelRys/best-practices-and-performance-tuning-of-usql-in-azure-data-lake-sql-konferenz-2018. There is a preview option

SET @@FeaturePreviews="InputFileGrouping:on";

which groups small files into limited vertices.



来源:https://stackoverflow.com/questions/50507490/data-lake-analytics-u-sql-extract-speed-local-vs-azure

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!