Fastest way to perform complex search on pandas dataframe

后端 未结 2 650
广开言路
广开言路 2020-12-13 21:10

I am trying to figure out the fastest way to perform search and sort on a pandas dataframe. Below are before and after dataframes of what I am trying to accomplish.

2条回答
  •  南笙
    南笙 (楼主)
    2020-12-13 21:37

    This is network problem , so we using networkx , notice , here you can have more than two stops , which means you can have some case like NY-DC-WA-NC

    import networkx as nx
    G=nx.from_pandas_edgelist(df, 'flightTo', 'flightFrom')
    
    # create the nx object from pandas dataframe
    
    l=list(nx.connected_components(G))
    
    # then we get the list of components which as tied to each other , 
    # in a net work graph , they are linked 
    L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
    
    # then from the above we can create our map dict , 
    # since every components connected to each other , 
    # then we just need to pick of of them as key , then map with others
    
    d={k: v for d in L for k, v in d.items()}
    
    # create the dict for groupby , since we need _from as first item and _to as last item 
    grouppd=dict(zip(df.columns.tolist(),['first','last']*3))
    df.groupby(df.flightTo.map(d)).agg(grouppd) # then using agg with dict yield your output 
    
    Out[22]: 
             flightTo flightFrom  toNum  fromNum  toCode  fromCode
    flightTo                                                      
    0             ABC        XYZ    123      893    8000      9999
    1             AAA        CCC    473      341    5555      5555
    

    Installation networkx

    • Pip: pip install networkx
    • Anaconda: conda install -c anaconda networkx

提交回复
热议问题