azure data factory: how to merge all files of a folder into one file

被刻印的时光 ゝ 提交于 2019-12-11 17:38:56

问题


I need to create a big file, by merging multiple files scattered in several subfolders contained in an Azure Blob Storage, also a transformation needs to be done, each file contains a JSON array of a single element, so the final file, will contain an array of JSON elements.

The final purpose is to process that Big file in a Hadoop & MapReduce job.

The layout of the original files is similar to this:

folder
 - month-01
   - day-01
        - files...

- month-02
    - day-02
        - files...

回答1:


I did a test based on your descriptions,please follow my steps.

My simulate data:

test1.json resides in the folder: date/day1

test2.json resides in the folder: date/day2

Source DataSet,set the file format setting as Array of Objects and file path as root path.

Sink DataSet,set the file format setting as Array of Objects and file path as the file you want to store the final data.

Create Copy Activity and set the Copy behavior as Merge Files.

Execution result:

The destination of my test is still Azure Blob Storage, you could refer to this link to learn about Hadoop supports Azure Blob Storage.



来源:https://stackoverflow.com/questions/56550727/azure-data-factory-how-to-merge-all-files-of-a-folder-into-one-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!