问题
I need to create a big file, by merging multiple files scattered in several subfolders contained in an Azure Blob Storage, also a transformation needs to be done, each file contains a JSON array of a single element, so the final file, will contain an array of JSON elements.
The final purpose is to process that Big file in a Hadoop & MapReduce job.
The layout of the original files is similar to this:
folder
- month-01
- day-01
- files...
- month-02
- day-02
- files...
回答1:
I did a test based on your descriptions,please follow my steps.
My simulate data:
test1.json resides in the folder: date/day1
test2.json resides in the folder: date/day2
Source DataSet,set the file format setting as Array of Objects and file path as root path.
Sink DataSet,set the file format setting as Array of Objects and file path as the file you want to store the final data.
Create Copy Activity and set the Copy behavior as Merge Files.
Execution result:
The destination of my test is still Azure Blob Storage, you could refer to this link to learn about Hadoop supports Azure Blob Storage.
来源:https://stackoverflow.com/questions/56550727/azure-data-factory-how-to-merge-all-files-of-a-folder-into-one-file