Quick way to find all permutations of a pandas DataFrame that preserves a sort?

别说谁变了你拦得住时间么 提交于 2019-12-22 08:38:29

问题


I have a DataFrame, and I'd like find all the permutations of it that fulfill a simple ascending sort on one of the columns. (There are many ties.) For example, in the following DataFrame

df = pd.DataFrame({'name': ["Abe", "Bob", "Chris", "David", "Evan"], 
                   'age': [28, 20, 21, 22, 21]})

I'd be looking to sort by age and obtain the orders ["Bob", "Chris", "Evan", "David", "Abe"] and ["Bob", "Evan", "Chris", "David", "Abe"].

I'm new to python (and to pandas) and curious if there is a simple way to do this that I don't see.

Thanks!


回答1:


Since you're grouping by age, let's do that and return all the permutations for each group and then take the product (using itertools' product and permutation functions):

In [11]: age = df.groupby("age")

If we look at the permutations of a single group:

In [12]: age.get_group(21)
Out[12]:
   age   name
2   21  Chris
4   21   Evan

In [13]: list(permutations(age.get_group(21).index))
Out[13]: [(2, 4), (4, 2)]

In [14]: [df.loc[list(p)] for p in permutations(age.get_group(21).index)]
Out[14]:
[   age   name
 2   21  Chris
 4   21   Evan,    age   name
 4   21   Evan
 2   21  Chris]

We can do this on the entire DataFrame by returning just the index for each group (this assumes that the index is unique, if it's not reset_index prior to doing this... you may be able to do something slightly more lower level):

In [21]: [list(permutations(grp.index)) for (name, grp) in age]
Out[21]: [[(1,)], [(2, 4), (4, 2)], [(3,)], [(0,)]]

In [22]: list(product(*[(permutations(grp.index)) for (name, grp) in age]))
Out[22]: [((1,), (2, 4), (3,), (0,)), ((1,), (4, 2), (3,), (0,))]

We can glue these up with sum:

In [23]: [sum(tups, ()) for tups in product(*[(permutations(grp.index)) for (name, grp) in age])]
Out[23]: [(1, 2, 4, 3, 0), (1, 4, 2, 3, 0)]

If you make these a list you can apply loc (which gets you the desired result):

In [24]: [df.loc[list(sum(tups, ()))] for tups in product(*[list(permutations(grp.index)) for (name, grp) in age])]
Out[24]:
[   age   name
 1   20    Bob
 2   21  Chris
 4   21   Evan
 3   22  David
 0   28    Abe,    age   name
 1   20    Bob
 4   21   Evan
 2   21  Chris
 3   22  David
 0   28    Abe]

And the (list of) the name column:

In [25]: [list(df.loc[list(sum(tups, ())), "name"]) for tups in product(*[(permutations(grp.index)) for (name, grp) in age])]
Out[25]:
[['Bob', 'Chris', 'Evan', 'David', 'Abe'],
 ['Bob', 'Evan', 'Chris', 'David', 'Abe']]

Note: It may be faster to use a numpy permutation matrix and pd.tools.util.cartesian_product. I suspect it's much of a muchness and wouldn't explore this unless this was unusably slow (it's potentially going to be slow anyway because there could be many many permutations)...




回答2:


Halfway:

import pandas as pd
from itertools import permutations, product

df = pd.DataFrame({'name': ["Abe", "Bob", "Chris", "David",
                            "Evan","Ford","Giles","Ham"],
                   'age': [20, 20, 21, 22,
                           21, 21, 22, 22]})

dfg = df.groupby('age')
perms = {}
for k, v in dfg:
    perms[k] =  list(permutations(v.values))

print(perms)

and the product of the right view of the values of perms is Bob-your-21-year-old-uncle.




回答3:


Following cphlewis, I ended up with:

import pandas as pd
import itertools as it

df = pd.DataFrame({'name': ["Abe", "Bob", "Chris", "David",
                            "Evan", "Ford", "Giles", "Ham"],
                   'age': [20, 21, 20, 20,
                           24, 25, 25, 27]})

dfg = df.groupby('age')

permsAtEachAge = []
for age, people in dfg:
    permsAtEachAge.append(list(it.permutations(people.name.values)))

product = list(it.product(*permsAtEachAge))
overallPermutations = map(lambda x: list(it.chain(*x)), product)

Andy Hayden's complete solution looks to work perfectly, as well.




回答4:


You can use numpy to sort like

np.sort(data[age])[::-1]




回答5:


import pandas as pd

df.sort_values(by = 'age')


来源:https://stackoverflow.com/questions/29992105/quick-way-to-find-all-permutations-of-a-pandas-dataframe-that-preserves-a-sort

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!