strsplit and lapply

给你一囗甜甜゛ 提交于 2019-12-02 04:54:32

1) strapply

1a) scalar Here is a one-liner using strapply from the gsubfn package:

library(gsubfn)
x <- '"12,34,567"'

strapply(x, "\\d+", as.numeric, simplify = c)
## [1]  12  34 567

1b) vectorized A vectorized version is even simpler -- just remove the simplify=c like this:

v <- c('"1,2,3"', '"8,9"') # test data
strapply(v, "\\d+", as.numeric)`

2) gsub and scan

2a) scalar and here is a one-linear using gsub and scan:

scan(text = gsub('"', '', x), what = 0, sep = ",")
## Read 3 items
## [1]  12  34 567

2b) vectorized A vectorized version would involve lapply-ing over the components:

lapply(v, function(x) scan(text = gsub('"', '', x), what = 0, sep = ","))

3) strsplit

3a) scalar and here is a strsplit solution. Note that we split on both " and , :

as.numeric(strsplit(x, '[",]')[[1]][-1])
## [1]  12  34 567

3b) vectorized A vectorized solution would, again, involve lapply-ing over the components:

lapply(v, function(x) as.numeric(strsplit(x, '[",]')[[1]][-1]))

3c) vectorized - simpler or slightly simpler:

lapply(strsplit(gsub('"', '', v), split = ","), as.numeric)

I think your problem may stem from your source data. In any case, if you want to work with numbers, you will have get rid of quotes. I recommend gsub.

> x <- '"1,3,5"'
> x
[1] "\"1,3,5\""
> x <- gsub("\"", "", x)
> x
[1] "1,3,5"
> as.numeric(unlist(strsplit(x, ",")))
[1] 1 3 5

Try this:

x <-  "12,34,77"
sapply(strsplit(x, ",")[[1]], as.numeric, USE.NAMES=FALSE)
[1] 12 34 77

Since the result of strsplit() is a list of lists, you need to extract the first element and pass this to lapply().


If, however, your string really containst embedded quotes, you need to remove the embedded quotes first. You can use gsub() for this:

x <-  '"12,34,77"'
sapply(strsplit(gsub('"', '', x), ",")[[1]], as.numeric, USE.NAMES=FALSE)
[1] 12 34 77

As has already been pointed out, you need to regex out the quotation marks first.

The destring function in the taRifx library will do that (remove any non-numeric characters) and then coerce to numeric:

test <- '"12,34,77"'
library(taRifx)
lapply(strsplit(test,","),destring)
[[1]]
[1] 12 34 77
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!