Copy Different type of file from Gen1 Azur lake to Azur Gen2 lake with attribute( like last updated)

余生颓废 提交于 2021-01-28 06:24:51

问题


I need to migrate all my data from Azur data lake Gen1 to Lake Gen2. In my lake we have different types of file mixed (.txt, .zip,.json and many other). We want to move them as-it-is to GEN2 lake. Along with that we also want to maintain last updated time for all files as GEN1 lake.

I was looking to use ADF for this use case. But for that we need to define dataset, and to define dataset we have to define data format(Avro,json,xml, binary etc). As we have different type of data mixed, I tried to use binary format. But with binary format all file at destination have content type "application/octate-stream". Also not able to retain file update time.


回答1:


As you said, when the files are copied to Data Lake Gen2, all the files properties will be changed, such as 'LAST MODIFIED' time.

Like file uploading, these files are new created in Gen 2, and Azure will create the new properties for them. That's why We can not keep the old property in Gen 1.

When using binary format as the dataset, all the content type is application/octate-stream, we also can not change it.

The property difference between Gen1 and Gen 2(I copied files from Gen 1 to Gen 2):

Unless we download the 'word.csv' file and re-upload, the content type will change to application/vnd.ms-excel:

HTH.




回答2:


Last Modified Time is system metadata that represents that modification in the filesystem/container and it cannot be updated. Adding user meta data to capture meta data from the source is work around and powershell/.net/java sdk can be used for updating additional property. Below the workaround is implemented in PowerShell



来源:https://stackoverflow.com/questions/63981373/copy-different-type-of-file-from-gen1-azur-lake-to-azur-gen2-lake-with-attribute

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!