Applying function (ks.test) between two data frames colum-wise in R

前端 未结 2 790
时光取名叫无心
时光取名叫无心 2020-12-18 03:41

My simple question is: How do you do a ks.test between two data frames column by column?

Eg. We have two data frames:

D1 <- data.fra         


        
相关标签:
2条回答
  • 2020-12-18 03:52

    A tidyverse solution using map function from the purrr package together with tidy function from the broom package

    library(purrr)
    library(broom)
    
    # Data posted by @TUSHAr
    set.seed(12)
    D1 <- data.frame(A = rnorm(n = 30, mean = 5, sd = 2.5), 
                     B = rnorm(n = 30, mean = 4.5, sd = 2.2), 
                     C = rnorm(n = 30, mean = 2.5, sd = 12))
    D2 <- data.frame(A = rnorm(n = 30, mean = 5, sd = 2.49), 
                     B = rnorm(n = 30, mean = 4.4, sd = 2.2), 
                     C = rnorm(n = 30, mean = 2, sd = 12))
    
    # Loop through each column
    result <- colnames(D1) %>%
      set_names() %>% 
      # apply `ks.test` function for each column pair
      map(~ ks.test(D1[, .x], D2[, .x])) %>%
      # extract test results using `tidy` then bind them together by rows
      map_dfr(., broom::tidy, .id = "parameter")
    result
    
    #> # A tibble: 3 x 5
    #>   parameter statistic p.value method                           alternative
    #>   <chr>         <dbl>   <dbl> <chr>                            <chr>      
    #> 1 A             0.167   0.808 Two-sample Kolmogorov-Smirnov t~ two-sided  
    #> 2 B             0.2     0.594 Two-sample Kolmogorov-Smirnov t~ two-sided  
    #> 3 C             0.233   0.393 Two-sample Kolmogorov-Smirnov t~ two-sided
    

    Created on 2018-08-24 by the reprex package (v0.2.0.9000).

    0 讨论(0)
  • 2020-12-18 04:15

    Created two data.frames D1 and D2 with some random numbers and same column names.

    set.seed(12)
    D1 = data.frame(A=rnorm(n = 30,mean = 5,sd = 2.5),B=rnorm(n = 30,mean = 4.5,sd = 2.2),C=rnorm(n = 30,mean = 2.5,sd = 12))
    D2 = data.frame(A=rnorm(n = 30,mean = 5,sd = 2.49),B=rnorm(n = 30,mean = 4.4,sd = 2.2),C=rnorm(n = 30,mean = 2,sd = 12))
    

    Now we can use the column names to loop through and pass it to D1 and D2 to perform the ks.test on the corresponding columns of the respective data.frames.

    col.names = colnames(D1)
    lapply(col.names,function(t,d1,d2){ks.test(d1[,t],d2[,t])},D1,D2)
    
    #[[1]]
    
    #Two-sample Kolmogorov-Smirnov test
    
    #data:  d1[, t] and d2[, t]
    #D = 0.167, p-value = 0.81
    #alternative hypothesis: two-sided
    
    
    #[[2]]
    
    #Two-sample Kolmogorov-Smirnov test
    
    #data:  d1[, t] and d2[, t]
    #D = 0.233, p-value = 0.39
    #alternative hypothesis: two-sided
    
    
    #[[3]]
    
    #Two-sample Kolmogorov-Smirnov test
    
    #data:  d1[, t] and d2[, t]
    #D = 0.2, p-value = 0.59
    #alternative hypothesis: two-sided
    

    In the notation you have used in the question description, ideally the following code should work:

    col.names =colnames(S)
    lapply(col.names,function(t,d1,d2){ks.test(d1[,t],d2[,t])},D,S)
    
    0 讨论(0)
提交回复
热议问题