Take randomly sample based on groups

前端 未结 8 688
说谎
说谎 2020-11-28 13:23

I have a df made by almost 50,000 rows spread in 15 different IDs (every ID has thousands of observations). df looks like:

        ID  Year    Temp    ph
1           


        
8条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-28 13:54

    library(data.table) #1
    df <- data.table(df) #2
    df[,group_num := sample(2,.N,replace = TRUE,prob = c(500,.N-500)/.N),by = "ID"] #3
    df_sample = df[group_num == 1,] #4
    

    or you can change line #3 and #4 to:

    df[,random_num := sample(.N,.N),by="ID"]
    df_sample  = df[random_num <=500,]
    

提交回复
热议问题