问题
I am currently using the networkx function *all_simple_paths* to find all paths within a network G, for a given set of source and target nodes.
On larger/denser networks this process is incredibly intensive.
I would like to know if multiprocessing could conceivably be used on this problem, and if anybody had any ideas on how that might be implemented, through creating a Pool etc.
import networkx as nx
G = nx.complete_graph(8)
sources = [1,2]
targets = [5,6,7]
for target in targets:
for source in sources:
for path in nx.all_simple_paths(G, source=source, target=target, cutoff=None):
print(path)
Many thanks in advance for any suggestions you may have!
回答1:
Here is a version which uses a collection of worker processes. Each worker gets source, target pairs from a Queue, and collects the paths in a list. When all the paths have been found, the results are put in an output Queue, and collated by the main process.
import networkx as nx
import multiprocessing as mp
import random
import sys
import itertools as IT
import logging
logger = mp.log_to_stderr(logging.DEBUG)
def worker(inqueue, output):
result = []
count = 0
for pair in iter(inqueue.get, sentinel):
source, target = pair
for path in nx.all_simple_paths(G, source = source, target = target,
cutoff = None):
result.append(path)
count += 1
if count % 10 == 0:
logger.info('{c}'.format(c = count))
output.put(result)
def test_workers():
result = []
inqueue = mp.Queue()
for source, target in IT.product(sources, targets):
inqueue.put((source, target))
procs = [mp.Process(target = worker, args = (inqueue, output))
for i in range(mp.cpu_count())]
for proc in procs:
proc.daemon = True
proc.start()
for proc in procs:
inqueue.put(sentinel)
for proc in procs:
result.extend(output.get())
for proc in procs:
proc.join()
return result
def test_single_worker():
result = []
count = 0
for source, target in IT.product(sources, targets):
for path in nx.all_simple_paths(G, source = source, target = target,
cutoff = None):
result.append(path)
count += 1
if count % 10 == 0:
logger.info('{c}'.format(c = count))
return result
sentinel = None
seed = 1
m = 1
N = 1340//m
G = nx.gnm_random_graph(N, int(1.7*N), seed)
random.seed(seed)
sources = [random.randrange(N) for i in range(340//m)]
targets = [random.randrange(N) for i in range(1000//m)]
output = mp.Queue()
if __name__ == '__main__':
test_workers()
# test_single_worker()
# assert set(map(tuple, test_workers())) == set(map(tuple, test_single_worker()))
test_workers uses multiprocessing, test_single_worker uses a single process.
Running test.py does not raise an AssertionError, so it looks like both functions return the same result (at least for the limited tests I've run).
Here are the timeit results:
% python -mtimeit -s'import test as t' 't.test_workers()'
10 loops, best of 3: 6.71 sec per loop
% python -mtimeit -s'import test as t' 't.test_single_worker()'
10 loops, best of 3: 12.2 sec per loop
So test_workers was able to achieve a 1.8x speedup over test_single_worker on a 2-core system in this case. Hopefully, the code will scale well for your real problem too. I'd be interested to know the result.
Some points of interest:
- Calling
pool.apply_asyncon a short-lived function is very slow, because too much time is spent passing arguments in, and results out through queues rather than using the CPUs to do useful computation. - It is better to collect results in a list and put the full result in
the
outputQueue rather than putting results inoutputone at a time. Each object put in the Queue is pickled, and it is quicker to pickle one large list than it is many small lists. - I think it is safer to print from only one process, so the print statements do not step on each other (resulting in mangled output).
回答2:
For the simplest case it appears that your paths have no relations to each other, other than being part of the same graph, so there would not be any locking issues.
What I would do is you can use the multiprocessing module to start a new process on each loop over the targets using a Pool and the map method.
def create_graph_from_target( target )
for source in sources:
for path in nx.all_simple_paths(G, source=source, target=target, cutoff=None):
print(path)
from multiprocessing import Pool
p = Pool( processes=4 )
p.map( create_graph_from_target, targets )
p.close()
p.join()
来源:https://stackoverflow.com/questions/13993674/using-multiprocessing-for-finding-network-paths