How to split data into training/testing sets using sample function

前端 未结 24 1890
猫巷女王i
猫巷女王i 2020-11-22 10:43

I\'ve just started using R and I\'m not sure how to incorporate my dataset with the following sample code:

sample(x, size, replace = FALSE, prob = NULL)
         


        
24条回答
  •  爱一瞬间的悲伤
    2020-11-22 11:02

    After looking through all the different methods posted here, I didn't see anyone utilize TRUE/FALSE to select and unselect data. So I thought I would share a method utilizing that technique.

    n = nrow(dataset)
    split = sample(c(TRUE, FALSE), n, replace=TRUE, prob=c(0.75, 0.25))
    
    training = dataset[split, ]
    testing = dataset[!split, ]
    

    Explanation

    There are multiple ways of selecting data from R, most commonly people use positive/negative indices to select/unselect respectively. However, the same functionalities can be achieved by using TRUE/FALSE to select/unselect.

    Consider the following example.

    # let's explore ways to select every other element
    data = c(1, 2, 3, 4, 5)
    
    
    # using positive indices to select wanted elements
    data[c(1, 3, 5)]
    [1] 1 3 5
    
    # using negative indices to remove unwanted elements
    data[c(-2, -4)]
    [1] 1 3 5
    
    # using booleans to select wanted elements
    data[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
    [1] 1 3 5
    
    # R recycles the TRUE/FALSE vector if it is not the correct dimension
    data[c(TRUE, FALSE)]
    [1] 1 3 5
    

提交回复
热议问题