Aggregating string columns using pandas GroupBy

我的梦境 提交于 2020-01-23 11:08:38

问题


I have a DF such as the following:

df =

vid   pos      value       sente
1     a         A           21
2     b         B           21
3     b         A           21
3     a         A           21
1     d         B           22
1     a         C           22
1     a         D           22
2     b         A           22
3     a         A           22

Now I want to combine all rows with the same value for sente and vid into one row with the values for value joined by an " "

df2 =

vid   pos      value       sente
1     a         A           21
2     b         B           21
3     b a       A A         21
1     d a a     B C D       22
2     b         A           22
3     a         A           22

I suppose a modification of this should do the trick:

df2 = df.groupby["sente"].agg(lambda x: " ".join(x))

But I can't seem to figure out how to add the second column to the statement.


回答1:


Groupers can be passed as lists. Furthermore, you can simplify your solution a bit by ridding your code of the lambda—it isn't needed.

df.groupby(['vid', 'sente'], as_index=False, sort=False).agg(' '.join)

   vid  sente    pos  value
0    1     21      a      A
1    2     21      b      B
2    3     21    b a    A A
3    1     22  d a a  B C D
4    2     22      b      A
5    3     22      a      A

Some other notes: specifying as_index=False means your groupers will be present as columns in the result (and not as the index, as is the default). Furthermore, sort=False will preserve the original order of the columns.




回答2:


As of this edit, @cᴏʟᴅsᴘᴇᴇᴅ's answer is way better.

Fun Way! Only works because single char values

df.set_index(['sente', 'vid']).sum(level=[0, 1]).applymap(' '.join).reset_index()


   sente  vid    pos  value
0     21    1      a      A
1     21    2      b      B
2     21    3    b a    A A
3     22    1  d a a  B C D
4     22    2      b      A
5     22    3      a      A

somewhat ok answer

df.set_index(['sente', 'vid']).groupby(level=[0, 1]).apply(
    lambda d: pd.Series(d.to_dict('l')).str.join(' ')
).reset_index()

   sente  vid    pos  value
0     21    1      a      A
1     21    2      b      B
2     21    3    b a    A A
3     22    1  d a a  B C D
4     22    2      b      A
5     22    3      a      A

not recommended

df.set_index(['sente', 'vid']).add(' ') \
  .sum(level=[0, 1]).applymap(str.strip).reset_index()

   sente  vid    pos  value
0     21    1      a      A
1     21    2      b      B
2     21    3    b a    A A
3     22    1  d a a  B C D
4     22    2      b      A
5     22    3      a      A


来源:https://stackoverflow.com/questions/50357837/aggregating-string-columns-using-pandas-groupby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!