问题
I am dealing with a pandas data frame as shown below.
id x1 y1
0 2 some_val some_val
1 2 some_val some_val
2 2 some_val some_val
3 2 some_val some_val
4 2 some_val some_val
5 0 0 0
6 3 some_val some_val
7 3 some_val some_val
8 0 0 0
9 5 some_val some_val
10 5 some_val some_val
11 5 some_val some_val
12 0 0 0
13 6 some_val some_val
14 6 some_val some_val
15 6 some_val some_val
16 6 some_val some_val
My original data frame was the data frame without the rows with all '0' values. As per the project requirement I had to insert the rows with all 0's value whenever the "id" changes.
Now I want to delete all the rows of any "id" which has 3 and less than 3 rows. From the above data frame, I would want to delete all the respective rows of id- "3" and "5" . My resultant data frame should look like below:
id x1 y1
0 2 some_val some_val
1 2 some_val some_val
2 2 some_val some_val
3 2 some_val some_val
4 2 some_val some_val
5 0 0 0
6 6 some_val some_val
7 6 some_val some_val
8 6 some_val some_val
9 6 some_val some_val
Kindly suggest me a way to obtain this result.
回答1:
The simplest answer is to remove the zero rows because they may get in the way of the calculation if you have more than 3 of them. then do a group by. then filter. then add back zeros like you did in other question/answer
d1 = df.query('ProjID != 0').groupby('ProjID').filter(lambda df: len(df) > 3)
d1
ProjID Xcoord Ycoord
0 2 -7.863509 5.221327
1 2 some_val some_val
2 2 some_val some_val
3 2 some_val some_val
4 2 some_val some_val
13 6 some_val some_val
14 6 some_val some_val
15 6 some_val some_val
16 6 some_val some_val
Then add back
pidv = d1.ProjID.values
pid_chg = np.append(pidv[:-1] != pidv[1:], True)
i = d1.index.repeat(pid_chg + 1)
d2 = d1.loc[i, :].copy()
d2.loc[i.duplicated()] = 0
d2.reset_index(drop=True)
ProjID Xcoord Ycoord
0 2 -7.863509 5.221327
1 2 some_val some_val
2 2 some_val some_val
3 2 some_val some_val
4 2 some_val some_val
5 0 0 0
6 6 some_val some_val
7 6 some_val some_val
8 6 some_val some_val
9 6 some_val some_val
10 0 0 0
回答2:
Say your DataFrame name is df, you need to do the following:
df = df[df['col'<>=condition]]
Specifically to your case:
df = df[df['ProjID'!=3]]
Same with 5. You can combine both filters with an 'and' for efficiency.
This is called DataFrame indexing filters.
回答3:
You can use groupby and filter the IDs with count less than three and use the resulting list to index the df.
filtered = df.groupby('ProjID').Xcoord.filter(lambda x: x.count() > 3)
df.iloc[filtered.index.tolist()]
ProjID Xcoord Ycoord
0 2 -7.863509 5.221327
1 2 some_val some_val
2 2 some_val some_val
3 2 some_val some_val
4 2 some_val some_val
13 6 some_val some_val
14 6 some_val some_val
15 6 some_val some_val
16 6 some_val some_val
来源:https://stackoverflow.com/questions/42985019/how-to-delete-specific-rows-from-pandas-data-frame