Python extracting new dataframe

后端 未结 1 1774
野趣味
野趣味 2020-12-22 14:13

I have a dataframe:

  topic  student level 
    1      a       1     
    1      b       2     
    1      a       3     
    2      a       1     
    2             


        
相关标签:
1条回答
  • 2020-12-22 14:41

    Assume your data is already sorted by topic,student and then level. If not, please sort it first.

    #generate the reply_count for each valid combination by comparing the current row and the row above.
    count_list = df.apply(lambda x: [df.ix[x.name-1].student if x.name >0 else np.nan, x.student, x.level>1], axis=1).values
    
    #create a count dataframe using the count_list data
    df_count = pd.DataFrame(columns=['st_source','st_dest','reply_count'], data=count_list)
    
    #Aggregate and sum all counts belonging to a source-dest pair, finally remove rows with same source and dest.
    df_count = df_count.groupby(['st_source','st_dest']).sum().astype(int).reset_index()[lambda x: x.st_source != x.st_dest]
    
    print(df_count)
    Out[218]: 
      st_source st_dest  reply_count
    1         a       b            4
    2         b       a            2
    3         b       c            1
    4         c       a            1
    5         c       b            1
    
    0 讨论(0)
提交回复
热议问题