Using Boolean Logic to clean DF in pandas

不打扰是莪最后的温柔 提交于 2019-12-23 05:13:07

问题


df

shape   square
shape   circle
animal   NaN
NaN dog
NaN cat
NaN fish
color   red
color   blue

desired_df

shape   square
shape   circle
animal  dog
animal  cat
animal  fish
color   red
color   blue

I have a df contains information that needs to be normalized.

I have noticed a pattern that indicates how to join the columns and normalize the data.

If in Col1 != NaN and Col2 == NaN and directly in the following row Col1 == NaN and Col2 != NaN, then then values from Col1 and Col2 should be joined. This continues until arriving to a row that contains values Col1 != NaN and Col2 !=NaN .

Is there a way to solve this in pandas?

The first step that I am thinking of is to create an additional column in order containing True/False values in order to determine what columns to join, however, once doing that, I am not sure how to assign the value in Col1 to all of the relevant values in Col2.

Any suggestions to arrive at desired result?


回答1:


If your identified pattern is a heuristic which, nevertheless, I struggle to follow, you can instead try pd.Series.ffill and pd.Series.bfill to reach your desired result:

df[0] = df[0].ffill()
df[1] = df[1].bfill()

Then drop duplicates:

df = df.drop_duplicates()

print(df)

        0       1
0   shape  square
1   shape  circle
2  animal     dog
4  animal     cat
5  animal    fish
6   color     red
7   color    blue


来源:https://stackoverflow.com/questions/50965844/using-boolean-logic-to-clean-df-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!