Generate all paths in an efficient way using networkx in python

本秂侑毒 提交于 2020-03-25 19:46:28

问题


I am trying to generate all paths with at most 6 nodes from every origin to every destination in a fairly large network (20,000+ arcs). I am using networkx and python 2.7. For small networks, it works well but I need to run this for the whole network. I was wondering if there is a more efficient way to do this in python. My code contains a recursive function (see below). I am thinking about keeping some of the paths in memory so that I don't create them again for other paths but I am not sure how I can accomplish it fast. right now it can't finish even within a few days. 3-4 hours should be fine for my project.

Here is a sample that I created. Feel free to ignore print functions as I added them for illustration purposes. Also here is the sample input file. input

import networkx as nx
import pandas as pd
import copy
import os

class ODPath(object):    
    def __init__(self,pathID='',transittime=0,path=[],vol=0,OD='',air=False,sortstrat=[],arctransit=[]):
        self.pathID = pathID
        self.transittime = transittime
        self.path = path
        self.vol = vol
        self.OD = OD
        self.air = air
        self.sortstrat = sortstrat # a list of sort strategies
        self.arctransit = arctransit # keep the transit time of each arc as a list
    def setpath(self,newpath):
        self.path = newpath
    def setarctransitpath(self,newarctransit):
        self.arctransit = newarctransit
    def settranstime(self,newtranstime):
        self.transittime = newtranstime
    def setpathID(self,newID):
        self.pathID = newID
    def setvol(self,newvol):
        self.vol = newvol
    def setOD(self,newOD):
        self.OD = newOD
    def setAIR(self,newairTF):
        self.air = newairTF
    def setsortstrat(self,newsortstrat):
        self.sortstrat = newsortstrat

def find_allpaths(graph, start, end, pathx=ODPath(None,0,[],0,None,False)):
    path = copy.deepcopy(pathx) #to make sure the original has not been revised
    newpath = path.path +[start]    
    path.setpath(newpath)
    if len(path.path) >6:
        return []
    if start == end: 
    return [path]
    if (start) not in graph:    #check if node:start exists in the graph
        return []
    paths = []
    for node in graph[start]:   #loop over all outbound nodes of starting point  
        if node not in path.path:    #makes sure there are no cycles
            newpaths = find_allpaths(graph,node,end,path)
            for newpath in newpaths:
                if len(newpath.path) < 7: #generate only the paths that are with at most 6 hops      
                    paths.append(newpath)
    return paths
def printallpaths(path_temp):
map(printapath,path_temp)
def printapath(path):
print path.path

filename='transit_sample1.csv'
df_t= pd.read_csv(filename,delimiter=',')
df_t = df_t.reset_index()
G=nx.from_pandas_dataframe(df_t, 'node1', 'node2', ['Transit Time'],nx.DiGraph())
allpaths=find_allpaths(G,'A','S')  
printallpaths(allpaths)

I would really appreciate any help.


回答1:


I actually asked this question previously about optimizing an algorithm I wrote previously using networkx. Essentially what you'll want to do is move away from a recursive function, and towards a solution that uses memorization like I did.

From here you can do further optimizations like using multiple cores, or picking the next node to traverse based on criteria such as degree.




回答2:


NetworkX already has this feature. Unless you have a special case, it is generally better to use an established library feature, as it will be tested and efficient.

Here's a simple example:

from itertools import combinations
from networkx.algorithms.simple_paths import all_simple_paths
import networkx as nx

G = nx.DiGraph()
node_names = ['A', 'B', 'C', 'S']
G.add_edges_from(combinations(node_names, 2))
print (G)

for path in all_simple_paths(G, 'A', 'S'):
    print (path)

(I skipped the csv file, as it's not important to this question, and I don't want this answer to fail if that dropbox goes away. It should work on all graphs.)

Without going into the internal details, I notice that your implementation generates a list of all answers, then returns it. You'll note that networkx.algorithms.simple_paths.all_simple_paths returns a generator. Generators are much friendlier on memory (allocation, caching, swapping) and thus perform better. There's some discussion on the topic here https://stackoverflow.com/a/102632/1766544 or ... pretty much everywhere.



来源:https://stackoverflow.com/questions/40478752/generate-all-paths-in-an-efficient-way-using-networkx-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!