Split a string vector at whitespace

后端 未结 9 1962
独厮守ぢ
独厮守ぢ 2020-12-08 09:35

I have the following vector:

tmp3 <- c(\"1500 2\", \"1500 1\", \"1510 2\", \"1510 1\", \"1520 2\", \"1520 1\", \"1530 2\", 
\"1530 1\", \"1540 2\", \"1540         


        
9条回答
  •  無奈伤痛
    2020-12-08 10:22

    Just to add two more options - using stringr::str_split() or data.table::tstrsplit()

    1) using stringr::str_split()

    # data posted above by the asker
    tmp3 <- c("1500 2", "1500 1", "1510 2", "1510 1", "1520 2", "1520 1", "1530 2", 
              "1530 1", "1540 2", "1540 1")
    
    library(stringr)
    
    as.integer(
      str_split(string = tmp3, 
                pattern = "[[:space:]]", 
                simplify = TRUE)[, 2] 
    )
    #>  [1] 2 1 2 1 2 1 2 1 2 1
    

    simplify = TRUE tells str_split to return a matrix, then we can index the matrix for the desired column, therefore, the [, 2] part

    2) Using data.table::tstrsplit()

    library(data.table)
    
    as.data.table(tmp3)[, tstrsplit(tmp3, split = "[[:space:]]", type.convert = TRUE)][, V2]
    #>  [1] 2 1 2 1 2 1 2 1 2 1
    

    type.convert = TRUE is responsible for the conversion to integer here, but use this with care for other datasets. The indexing [, V2] part has a similar reason as explained above for [, 2]. Here it selects the second column of the returned data table object, which contains the values desired by the asker as integers.

    sessionInfo()
    #> R version 4.0.0 (2020-04-24)
    #> Platform: x86_64-w64-mingw32/x64 (64-bit)
    #> Running under: Windows 10 x64 (build 18362)
    #> 
    #> Matrix products: default
    #> 
    #> locale:
    #> [1] LC_COLLATE=English_United States.1252 
    #> [2] LC_CTYPE=English_United States.1252   
    #> [3] LC_MONETARY=English_United States.1252
    #> [4] LC_NUMERIC=C                          
    #> [5] LC_TIME=English_United States.1252    
    #> 
    #> attached base packages:
    #> [1] stats     graphics  grDevices utils     datasets  methods   base     
    #> 
    #> loaded via a namespace (and not attached):
    #>  [1] compiler_4.0.0  magrittr_1.5    tools_4.0.0     htmltools_0.4.0
    #>  [5] yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6   rmarkdown_2.1  
    #>  [9] highr_0.8       knitr_1.28      stringr_1.4.0   xfun_0.13      
    #> [13] digest_0.6.25   rlang_0.4.6     evaluate_0.14
    

    Created on 2020-05-06 by the reprex package (v0.3.0)

提交回复
热议问题