strsplit

R: how to display the first n characters from a string of words

[亡魂溺海] 提交于 2019-12-05 05:40:16
I have the following string: Getty <- "Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal." I want to display the first 10 characters. So I began by splitting the string into individual characters: split <- strsplit(Getty, split="") split I get all the individual characters as this point. Then I make a substring of the first 10 characters. first.10 <- substr(split, start=1, stop=10) first.10 And here is the output: "c(\"F\", \"o\"" I am not understanding why this prints

R: strsplit on backslash (\\)

蹲街弑〆低调 提交于 2019-12-04 06:26:38
I am trying to extract the part of the string before the first backslash but I can't seem to get it tot work properly. I have tried multiple ways of getting it to work, based on the manual page for strsplit and after searching online. In my actual situation the strings are in a dataframe which I get from a database connection but I can simplify the situation with the following: > strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=TRUE) [[1]] [1] "BLAAT1\022E:" "BLAAT2" "BLAAT3" > strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=FALSE) Error in strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3", "\\",

using strsplit and subset in dplyr and mutate

我与影子孤独终老i 提交于 2019-12-04 03:45:37
I have a data table with one string column. I'd like to create another column that is a subset of this column using strsplit. dat <- data.table(labels=c('a_1','b_2','c_3','d_4')) The output I want is label sub_label a_1 a b_2 b c_3 c d_4 d I've tried the followings but neither seems to work. dat %>% mutate( sub_labels=strsplit(as.character(labels), "_")[[1]][1] ) # gives a column whose values are all "a" this one, which seems logical to me, dat %>% mutate( sub_labels=sapply(strsplit(as.character(labels), "_"), function(x) x[[1]][1]) ) gives an error Error: Don't know how to handle type

Using strsplit() in R, ignoring anything in parentheses

我的梦境 提交于 2019-12-04 01:31:36
I'm trying to use strsplit() in R to break a string into pieces based on commas, but I don't want to split up anything in parentheses. I think the answer is a regex but I'm struggling to get the code right. So for example: x <- "This is it, isn't it (well, yes)" > strsplit(x, ", ") [[1]] [1] "This is it" "isn't it (well" "yes)" When what I would like is: [1] "This is it" "isn't it (well, yes)" akrun We can use PCRE regex to FAIL any , that follows that a ( before the ) and split by , followed by 0 or more space ( \\s* ) strsplit(x, '\\([^)]+,(*SKIP)(*FAIL)|,\\s*', perl=TRUE)[[1]] #[1] "This is

How to split a string on first number only

旧城冷巷雨未停 提交于 2019-12-04 01:09:43
问题 So i have a dataset with street adresses, they are formatted very differently. For example: d <- c("street1234", "Street 423", "Long Street 12-14", "Road 18A", "Road 12 - 15", "Road 1/2") From this I want to create two columns. 1. X: with the street address and 2. Y: with the number + everything that follows. Like this: X Y Street 1234 Street 423 Long Street 12-14 Road 18A Road 12 - 15 Road 1/2 Until now I have tried strsplit and followed some similar questions here , for example: strsplit(d,

strsplit inconsistent with gregexpr

允我心安 提交于 2019-12-04 00:15:11
A comment on my answer to this question which should give the desired result using strsplit does not, even though it seems to correctly match the first and last commas in a character vector. This can be proved using gregexpr and regmatches . So why does strsplit split on each comma in this example, even though regmatches only returns two matches for the same regex? # We would like to split on the first comma and # the last comma (positions 4 and 13 in this string) x <- "123,34,56,78,90" # Splits on every comma. Must be wrong. strsplit( x , '^\\w+\\K,|,(?=\\w+$)' , perl = TRUE )[[1]] #[1] "123"

Non character argument in R string split function (strsplit)

若如初见. 提交于 2019-12-03 22:08:12
This works x <- "0.466:1.187:2.216:1.196" y <- as.numeric(unlist(strsplit(x, ":"))) Values of blat$LRwAvg all look like X above but this doesn't work for (i in 1:50){ y <- as.numeric(unlist(strsplit(blat$LRwAvg[i], "\\:"))) blat$meanLRwAvg[i]=mean(y) } Because of: Error in strsplit(blat$LRwAvg[i], "\:") : non-character argument It doesn't matter if I have one, two or null backslashes. What's my problem? (Not generally, I mean in this special task, technically) As agstudy implied blat$LRwAvg <- as.character(blat$LRwAvg) before loop fixed it blat$meanLRwAvg <- blat$gtFrqAvg #or some other

Split 1 Column into 2 Columns in a Dataframe [duplicate]

半腔热情 提交于 2019-12-02 23:52:07
问题 This question already has answers here : Split data frame string column into multiple columns (14 answers) Closed 6 years ago . Here's my data frame. > data Manufacturers 1 Audi,RS5 2 BMW,M3 3 Cadillac,CTS-V 4 Lexus,ISF So I would want to split the manufacturers and the models, like this, > data Manufacturers Models 1 Audi RS5 2 BMW M3 3 Cadillac CTS-V 4 Lexus ISF I would appreciate any help on this question. Thanks a lot. 回答1: Some sample data. You could use a character vector, but I'll use

splitting string expression at multiple delimiters in R

匆匆过客 提交于 2019-12-02 20:26:21
问题 I am trying to parse some math expressions in R, and I would therefore like to split them at multiple delimiters +,-,*,/, -(, +(, ), )+ etc so that I get the list of symbolic variables contained in the expression. so e.g. I would like 2*(x1+x2-3*x3) to return "x1", "x2", "x3" Is there a good way of doing it? Thanks. 回答1: There's probably a cleaner way of doing this, but does this cover your use case(s)? eqn = "3 + 2*(x1+x2-3*x3 - x1/x3) - 5" vars = unlist(strsplit(eqn, split="[-+*/)( ]|[^x][0

How to vectorize R strsplit?

给你一囗甜甜゛ 提交于 2019-12-02 18:21:38
When creating functions that use strsplit , vector inputs do not behave as desired, and sapply needs to be used. This is due to the list output that strsplit produces. Is there a way to vectorize the process - that is, the function produces the correct element in the list for each of the elements of the input? For example, to count the lengths of words in a character vector: words <- c("a","quick","brown","fox") > length(strsplit(words,"")) [1] 4 # The number of words (length of the list) > length(strsplit(words,"")[[1]]) [1] 1 # The length of the first word only > sapply(words,function (x)