split character data into numbers and letters

后端未结

关注

 7  1742

I have a vector of character data. Most of the elements in the vector consist of one or more letters followed by one or more numbers. I wish to split each element in the

相关标签:

7条回答

广开言路

2020-12-02 16:25
Late answer, but another option is to use strsplit with a regex pattern which uses lookarounds to find the boundary between numbers and letters:
```
var <- "ABC123"
strsplit(var, "(?=[A-Za-z])(?<=[0-9])|(?=[0-9])(?<=[A-Za-z])", perl=TRUE)
[[1]]
[1] "ABC" "123"
```
The above pattern will match (but not consume) when either the previous character is a letter and the following character is a number, or vice-versa. Note that we use strsplit in Perl mode to access lookarounds.

Demo
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-12-02 16:29
For your regex you have to use:
```
gsub("[[:digit:]]","",my.data)
```
The [:digit:] character class only makes sense inside a set of [].
0 讨论(0)
发布评论:

提交评论
- 加载中...

余生分开走

2020-12-02 16:37

You can also use colsplit from reshape2 to split your vector into character and digit columns in one step:

library(reshape2)

colsplit(my.data, "(?<=\\p{L})(?=[\\d+$])", c("char", "digit"))

Result:

  char digit
1  aaa    NA
2    b    11
3    b    21
4    b   101
5    b   111
6  ccc     1
7  ffffd     1
8  ccc    20
9  ffffd    13

Data:

my.data <- c("aaa", "b11", "b21", "b101", "b111", "ccc1", "ffffd1", "ccc20", "ffffd13")

0 讨论(0)

猫巷女王i

2020-12-02 16:40

With stringr, if you like (and slightly different from the answer to the other question):

# load library
library(stringr)
#
# load data
my.data <- c("aaa", "b11", "b21", "b101", "b111", "ccc1", "ffffd1", "ccc20", "ffffd13")
#
# extract numbers only
my.data.num <- as.numeric(str_extract(my.data, "[0-9]+"))
#
# check output
my.data.num
[1]  NA  11  21 101 111   1   1  20  13
#
# extract characters only
my.data.cha <- (str_extract(my.data, "[aA-zZ]+"))
# 
# check output
my.data.cha
[1] "aaa" "b"   "b"   "b"   "b"   "ccc" "ffffd" "ccc" "ffffd"

0 讨论(0)

别跟我提以往

2020-12-02 16:44
```
mydata.nub<-gsub("\ \ D","",my.data)

mydata.text<-gsub("\ \ d","",my.data)
```
This one is perfect, and it also separates number and text, even if there is number between the text.
0 讨论(0)
发布评论:

提交评论
- 加载中...

隐瞒了意图╮

2020-12-02 16:45

Since none of the previous answers use tidyr::separate here it goes:

library(tidyr)

df <- data.frame(mycol = c("APPLE348744", "BANANA77845", "OATS2647892", "EGG98586456"))

df %>%
  separate(mycol, 
           into = c("text", "num"), 
           sep = "(?<=[A-Za-z])(?=[0-9])"
           )

0 讨论(0)

1 2 下一页