I am trying to figure out the fastest way to perform search and sort on a pandas dataframe. Below are before and after dataframes of what I am trying to accomplish.
This is network problem , so we using networkx , notice , here you can have more than two stops , which means you can have some case like NY-DC-WA-NC
import networkx as nx
G=nx.from_pandas_edgelist(df, 'flightTo', 'flightFrom')
# create the nx object from pandas dataframe
l=list(nx.connected_components(G))
# then we get the list of components which as tied to each other ,
# in a net work graph , they are linked
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
# then from the above we can create our map dict ,
# since every components connected to each other ,
# then we just need to pick of of them as key , then map with others
d={k: v for d in L for k, v in d.items()}
# create the dict for groupby , since we need _from as first item and _to as last item
grouppd=dict(zip(df.columns.tolist(),['first','last']*3))
df.groupby(df.flightTo.map(d)).agg(grouppd) # then using agg with dict yield your output
Out[22]:
flightTo flightFrom toNum fromNum toCode fromCode
flightTo
0 ABC XYZ 123 893 8000 9999
1 AAA CCC 473 341 5555 5555
Installation networkx
pip install networkxconda install -c anaconda networkx