Azure batch task dependencies: copy files from previous

无人久伴 提交于 2021-02-11 14:41:36

问题


I have a Azure Batch scenario where I have a chain of Tasks which are run after each other. Dependencies are set correctly so they run nicely after each other.

However I need to copy all files from the previous Task's folder to the new Task's folder before execution. I do not know in advance how many and what files there will be so I just want to copy everything. I could not find a way to accomplish this with the Batch client library (https://docs.microsoft.com/en-us/dotnet/api/overview/azure/batch?view=azure-dotnet).

As a workaround I tried adding a simple copy task to the .bat file which is executed with commandline but for some reason it only copies some of the files. In one task there are a few hundred files to copy and it varies a few % how big portion it copies before it stops copying (with no errors). This is my copy command: $"cmd /c xcopy /E /F /Y %AZ_BATCH_TASK_WORKING_DIR%\\..\\..\\{previousTaskId}\\wd %AZ_BATCH_TASK_WORKING_DIR%". Everything works correctly if performed directly from the VM.

Tested hypothesis:

  • Copying overwrites the .bat file which executes the actual processing. This in turn breaks the copying. I've now ruled out this problem (each task has a differently named .bat file)
  • Copying is done for some reason in parallel. I added timestamp echos to the bats and there is no parallelism so this can't be the reason. Also tried adding sleep 10 before the xcopy but didn't make any difference.
  • xcopy wouldn't see all the files for some reason. Added a dir command to see what files there are and it sees only the same files which xcopy copies.
  • user access issues. Doesn't make sense as some files are copied succesfully and there are no errors.

Any ideas? This sounds like a trivial scenario but I just couldn't figure out how to do this.


回答1:


What do you have configured as your retentionTime for your tasks?

I'm wondering if Batch is cleaning up the previous task (removing all the files) at the same time as your downstream task is trying to copy them.

An untested suggestion ...

... assuming you have tasks A & B that run in that order (enforced using Task Dependencies).

... configure outputFile on task A to copy all of the files generated by A into your storage account. Use wildcards so that all the files are copied into the same container.

... configure resourceFile on task B to copy all the files from your storage account into the task working directory.

This has the advantage of preserving your intermediate working files off the compute node - allowing you to pick up where you left off if/when something interrupts your workload.




回答2:


It turned out that the problem was in the previous Task: it launched a process which started generating the files in the background and returned control immediately. Therefore the Batch engine thought the Task had finished and continued to the next Task which was first copying the files generated by the previous Task.

My hypothesis about parallelism was therefore partially true although it wasn't visible with echoing timestamps (first Task said it finished before second Task said it started). The experiment with sleep would've revealed the problem but I either used too short sleep delay or somehow read the results wrong.

Because I can't control how the first Task launches the process I now added some Windows Batch script to poll tasklist about when the process ends and it solved the problem.



来源:https://stackoverflow.com/questions/58444889/azure-batch-task-dependencies-copy-files-from-previous

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!