Df groupby set comparison

吃可爱长大的小学妹 提交于 2019-12-11 05:35:00

问题


I have a list of words that I want to test for anagrams. I want to use pandas so I don't have to use computationally wasteful for loops. Given a .txt list of words say:

"acb" "bca" "foo" "oof" "spaniel"

I want to put them in a df then group them by lists of their anagrams - I can remove duplicate rows later.

So far I have the code:

import pandas as pd

wordlist = pd.read_csv('data/example.txt', sep='\r', header=None, index_col=None, names=['word'])
wordlist = wordlist.drop_duplicates(keep='first')
wordlist['split'] = ''
wordlist['anagrams'] = ''

for index, row in wordlist.iterrows() :
    row['split'] = list(row['word'])

wordlist = wordlist.groupby('word')[('split')].apply(list)
print(wordlist)

How do I groupby a set so it knows that

[[a, b, c]]
[[b, a, c]]

are the same?


回答1:


I think you can use sorted lists:

df['a'] = df['word'].apply(lambda x: sorted(list(x)))
print (df)

      word                      a
0      acb              [a, b, c]
1      bca              [a, b, c]
2      foo              [f, o, o]
3      oof              [f, o, o]
4  spaniel  [a, e, i, l, n, p, s]

Another solution for find anagrams:

#reverse strings
df['reversed'] = df['word'].str[::-1]

#reshape
s = df.stack()
#get all dupes - anagrams
s1 = s[s.duplicated(keep=False)]
print (s1)
0  word        acb
   reversed    bca
1  word        bca
   reversed    acb
2  word        foo
   reversed    oof
3  word        oof
   reversed    foo
dtype: object

#if want select of values by second level word
s2 = s1.loc[pd.IndexSlice[:, 'word']]
print (s2)
0    acb
1    bca
2    foo
3    oof
dtype: object


来源:https://stackoverflow.com/questions/48323981/df-groupby-set-comparison

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!