Is conda install a thread-safe operation?

*爱你&永不变心* 提交于 2020-05-15 04:50:05

问题


I would like to install packages into multiple conda environments. Doing this one after the other takes quite some time, so it would be nice if I could run all the conda install steps for each environment in parallel. Would this be possible or are there conflicts (relating to hard links and lock files, possibly) when trying to run conda in parallel?


回答1:


The short answer: No, it should not be run concurrently.

Most of how Conda handles transaction safety was established in version v4.3. The release notes in v4.3.0 regarding changes to locks explicitly comment on running multiple processes:

[U]sers are cautioned that undefined behavior can result when conda is running in multiple process and operating on the same package caches and/or environments.

It sounds like you're talking about different environments, so that shouldn't be an issue. However, you need to ensure that the package(s) to be installed is already downloaded into the package cache, otherwise it is not safe.

Partial Parallel Strategy

There is a --download-only flag, which will only add the package to the package cache (i.e., the part that cannot be done concurrently). But the issue is that this would still need to be done on a per-env basis, since different envs could have different constraints (e.g., different Python versions) that require different builds of the package.

I think the best you could do at the CLI is

  1. Run conda install --download-only pkg sequentially on each env, then
  2. Run conda install pkg in parallel for the envs.

This is, however, not in any official recommendation, and changes in how Conda does transactions could lead to this not being safe. I'll also say that I very much doubt this will save you much time; in fact, it might take longer. This approach will involve every env having to solve and prepare transactions twice, and that is usually the most computationally intensive step. The part you end up parallelizing involves disk transactions, which is going to be I/O bound, so I kind of doubt any time will be saved.

Some Evidence For This Being Safe

While this doesn't positively prove its safety, we can explicitly examine the transactions to make sure that when we run Step 2 above, it will only involve LINK transactions.

To test this, I made two envs:

conda create -n foo -y python=3.6
conda create -n bar -y python=3.6

Then I check the output from

conda install -n foo -d --json pandas

which shows a list of both FETCH and LINK transactions. The former involve manipulating the package cache, whereas the latter only the env. If I then run

conda install -n foo --download-only pandas

and check again,

conda install -n foo -d --json pandas

I now see only LINK transactions. Notably, the same is now true for -n bar, which should reinforce the fact that Step 1 should be done sequentially. The good part is that it won't lead to redownloading the same package; the bad part, that it involves a solve happening in every env. In a more heterogenous environment, we could expect there might be different FETCH operations in each env.

Finally, I can run the parallel final install

conda install -n foo -y pandas & conda install -n bar -y pandas &

which is safe if we can assume that that LINK transactions in different envs are safe.



来源:https://stackoverflow.com/questions/58210335/is-conda-install-a-thread-safe-operation

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!