How to insert a row in a data frame under specific conditions?

杀马特。学长 韩版系。学妹 提交于 2019-12-24 10:47:20

问题


I understand that pandas is designed to load completely filled dataframes, but I want to add single rows to to the data frame under specific condition. Following is the df I am dealing with

       ProjID      Xcoord      Ycoord
    0  2           some_val    some_val
    1  2           some_val    some_val
    2  2           some_val    some_val
    3  2           some_val    some_val
    4  2           some_val    some_val
    5  3           some_val    some_val
    6  3           some_val    some_val
    7  5           some_val    some_val
    8  5           some_val    some_val
    9  5           some_val    some_val

What I want is to insert a row in the df with value 0 for every column, whenever the ProjID changes. Below is the required df:

      ProjID      Xcoord      Ycoord
    0  2           some_val    some_val
    1  2           some_val    some_val
    2  2           some_val    some_val
    3  2           some_val    some_val
    4  2           some_val    some_val
    5  0           0           0 
    6  3           some_val    some_val
    7  3           some_val    some_val
    8  0           0           0 
    9  5           some_val    some_val
   10  5           some_val    some_val
   11  5           some_val    some_val
   12  0           0           0

Basically a row with 0 values is inserted everytime, the ProjID changes. I was trying to write a for loop where the values in ProjID col is checked with previous row value of the ProjID col, if they are same, then it will move ahead, if they are not same, then it should insert a row with all 0 values. But, I am unable to implement it. Also, I have no idea how the index column will behave.

Please let me know if this kind of insertion of a row is possible in a df, and how do I do it. Thanks for any help.


回答1:


You can group data frame by ProjID, and use the append method to insert a row to each sub data frame at the end, this assumes all same ProjID have been sorted together before hand:

(df.groupby("ProjID", group_keys=False)
   .apply(lambda g: g.append(pd.Series(0, g.columns), ignore_index=True))
   .reset_index(drop=True))




回答2:


I was determined to find a different answer after @Psidom and I thought of the same exact answer and he was quicker. So, to make myself feel better, I came up with the below answer.

Difference between mine and @Psidom's. When using groupby all ProjID will be grouped together, even if they are not next to each other. If they are always expected to be next to each other, then it doesn't matter. However, if you want to count 111222111 as two separate changes, then this solution accounts for that.

pidv = df.ProjID.values
pid_chg = np.append(pidv[:-1] != pidv[1:], True)

i = df.index.repeat(pid_chg + 1)

d1 = df.loc[i, :].copy()

d1.loc[i.duplicated()] = 0

d1.reset_index(drop=True)

    ProjID    Xcoord    Ycoord
0        2  some_val  some_val
1        2  some_val  some_val
2        2  some_val  some_val
3        2  some_val  some_val
4        2  some_val  some_val
5        0         0         0
6        3  some_val  some_val
7        3  some_val  some_val
8        0         0         0
9        5  some_val  some_val
10       5  some_val  some_val
11       5  some_val  some_val
12       0         0         0

timing over sample data



来源:https://stackoverflow.com/questions/42840279/how-to-insert-a-row-in-a-data-frame-under-specific-conditions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!