Is conda install a thread-safe operation?

问题

I would like to install packages into multiple conda environments. Doing this one after the other takes quite some time, so it would be nice if I could run all the conda install steps for each environment in parallel. Would this be possible or are there conflicts (relating to hard links and lock files, possibly) when trying to run conda in parallel?

回答1:

The short answer: No, it should not be run concurrently.

Most of how Conda handles transaction safety was established in version v4.3. The release notes in v4.3.0 regarding changes to locks explicitly comment on running multiple processes:

[U]sers are cautioned that undefined behavior can result when conda is running in multiple process and operating on the same package caches and/or environments.

It sounds like you're talking about different environments, so that shouldn't be an issue. However, you need to ensure that the package(s) to be installed is already downloaded into the package cache, otherwise it is not safe.

Partial Parallel Strategy

There is a --download-only flag, which will only add the package to the package cache (i.e., the part that cannot be done concurrently). But the issue is that this would still need to be done on a per-env basis, since different envs could have different constraints (e.g., different Python versions) that require different builds of the package.

I think the best you could do at the CLI is

Run conda install --download-only pkg sequentially on each env, then
Run conda install pkg in parallel for the envs.

This is, however, not in any official recommendation, and changes in how Conda does transactions could lead to this not being safe. I'll also say that I very much doubt this will save you much time; in fact, it might take longer. This approach will involve every env having to solve and prepare transactions twice, and that is usually the most computationally intensive step. The part you end up parallelizing involves disk transactions, which is going to be I/O bound, so I kind of doubt any time will be saved.

Some Evidence For This Being Safe

While this doesn't positively prove its safety, we can explicitly examine the transactions to make sure that when we run Step 2 above, it will only involve LINK transactions.

To test this, I made two envs:

conda create -n foo -y python=3.6
conda create -n bar -y python=3.6

Then I check the output from

conda install -n foo -d --json pandas

which shows a list of both FETCH and LINK transactions. The former involve manipulating the package cache, whereas the latter only the env. If I then run

conda install -n foo --download-only pandas

and check again,

conda install -n foo -d --json pandas

I now see only LINK transactions. Notably, the same is now true for -n bar, which should reinforce the fact that Step 1 should be done sequentially. The good part is that it won't lead to redownloading the same package; the bad part, that it involves a solve happening in every env. In a more heterogenous environment, we could expect there might be different FETCH operations in each env.

Finally, I can run the parallel final install

conda install -n foo -y pandas & conda install -n bar -y pandas &

which is safe if we can assume that that LINK transactions in different envs are safe.

来源：https://stackoverflow.com/questions/58210335/is-conda-install-a-thread-safe-operation

标签

python

conda