Questions about pandas: expanding multivalued column, inverting and grouping

后端未结

关注

 2  1833

无人及你 2021-01-06 18:32

I was looking into pandas to do some simple calculations on NLP and text mining but I couldn\'t quite grasp how to do them.

Suppose I have the following data frame,

2条回答

自闭症患者 (楼主)

2021-01-06 19:08

It might be easier to create the expanded version at the time you create shingles. This question shows how you can use groupby to do this sort of expansion. Here's an example of what you can do after creating the "first name" column:

def shingles(table, n = 3):
    word = table['first name'].irow(0)
    shingles = [word[i:i + n] for i in range(len(word) - n + 1)]
    cols = {col: table[col].irow(0) for col in table.columns}
    cols['shingle'] = shingles
    return pandas.DataFrame(cols)

>>> df.groupby('name', group_keys=False).apply(shingles)
  first name gender          name shingle
0       Jane      F      Jane Doe     Jan
1       Jane      F      Jane Doe     ane
0       John      M   John Cusack     Joh
1       John      M   John Cusack     ohn
0       John      M      John Doe     Joh
1       John      M      John Doe     ohn
0       Mary      F  Mary Poppins     Mar
1       Mary      F  Mary Poppins     ary

(I grouped by name here rather than first name just in case there are duplicate first names, but it assumes the full name is unique.)

From there you should be able to group and count whatever you like.

0 讨论(0)

查看其它2个回答