python pandas merge two or more lines of text into one line

江枫思渺然 提交于 2019-12-08 03:43:45

问题


I have data frame with text data like below,

    name | address                  | number 
1   Bob    bob                        No.56
2          @gmail.com           
3   Carly  carly@world.com            No.90
4   Gorge  greg@yahoo     
5          .com                   
6                                     No.100

and want to make it like this frame.

    name | address               | number 
1   Bob    bob@gmail.com           No.56
2   Carly  carly@world.com         No.90                 
3   Gorge  greg@yahoo.com          No.100

I am using pandas to read file but not sure how to use merge or concat.


回答1:


In case of name column consists of unique values,

print df

    name          address  number
0    Bob              bob   No.56
1    NaN       @gmail.com     NaN
2  Carly  carly@world.com   No.90
3  Gorge       greg@yahoo     NaN
4    NaN             .com     NaN
5    NaN              NaN  No.100

df['name'] = df['name'].ffill()
print df.fillna('').groupby(['name'], as_index=False).sum()

    name          address  number
0    Bob    bob@gmail.com   No.56
1  Carly  carly@world.com   No.90
2  Gorge   greg@yahoo.com  No.100

you may need ffill(), bfill(), [::-1], .groupby('name').apply(lambda x: ' '.join(x['address'])), strip(), lstrip(), rstrip(), replace() kind of thing to extend above code to more complicated data.




回答2:


If you want to convert a data frame of sex rows (with possible NaN entry in each column), there might be no direct pandas methods for that.

You will need some codes to assign the value in name column, so that pandas can know the split rows of bob and @gmail.com belong to same user Bob.

You can fill each empty entry in column name with its preceding user using the fillna or ffill methods, see pandas dataframe missing data.

df ['name'] = df['name'].ffill()

# gives
    name    address number
0   Bob bob No.56
1   Bob @gmail.com  
2   Carly   carly@world.com No.90
3   Gorge   greg@yahoo  
4   Gorge   .com    
5   Gorge       No.100

Then you can use the groupby and sum as the aggregation function.

df.groupby(['name']).sum().reset_index()

# gives
    name    address number
0   Bob bob@gmail.com   No.56
1   Carly   carly@world.com No.90
2   Gorge   greg@yahoo.com  No.100

You may find converting between NaN and white space useful, see Replacing blank values (white space) with NaN in pandas and pandas.DataFrame.fillna.



来源:https://stackoverflow.com/questions/42240022/python-pandas-merge-two-or-more-lines-of-text-into-one-line

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!