I have a df made by almost 50,000 rows spread in 15 different IDs (every ID has thousands of observations). df looks like:
ID Year Temp ph 1
library(data.table) #1 df <- data.table(df) #2 df[,group_num := sample(2,.N,replace = TRUE,prob = c(500,.N-500)/.N),by = "ID"] #3 df_sample = df[group_num == 1,] #4
or you can change line #3 and #4 to:
df[,random_num := sample(.N,.N),by="ID"] df_sample = df[random_num <=500,]