strsplit

Split strings on third white space from the right

拥有回忆 提交于 2019-12-24 14:52:42
问题 I would like to split a series of strings on the third white space from the right. The number of white spaces varies among string, but each string has at least three white spaces. Here are two example strings. strings <- c('abca eagh ijkl mnop', 'dd1 ss j, ll bb aa') I would like: [1] 'abca', 'eagh ijkl mnop' [2] 'dd1 ss j,', 'll bb aa' The closest I have been able to come is: strsplit(strings, split = "(?<=\\S)(?=\\s(.*)\\s(.*)\\s(.*)$)", perl = TRUE) which returns: [[1]] [1] "abca" " eagh"

R Strsplit keep delimiter in second element

别来无恙 提交于 2019-12-24 08:43:33
问题 I have been trying to solve this little issue for almost 2 hours, but without success. I simply want to separate a string by the delimiter: one space followed by any character. In the second element I want to keep the delimiter, whereas in the first element it shall not appear. Example: x <- "123123 123 A123" strsplit(x," [A-Z]") results in: "123123 123" "A123" However, this does not keep the letter A in the second element. I have tried using strsplit(x,"(?<=[A-Z])",perl=T) but this does not

Reorder words in each element of a vector

最后都变了- 提交于 2019-12-24 00:27:04
问题 I'd like to change the word order for each element in a vector. Specifically I'd like to make another vector where the first word is now the last word for a number of elements that differ in length. Data metadata1 <- c("reference1 an organism", "reference2 another organism here", "reference3 yet another organism is here") Desired outcome metadata2 <- c("an organism reference1", "another organism here reference2", "yet another organism is here reference3") My attempt metadata2 <- lapply

R strsplit before ( and after ) keeping both delimiters

喜夏-厌秋 提交于 2019-12-23 12:55:57
问题 I have a string that looks like the following: x <- "01(01)121210(01)0001" I want to split this into a vector so that i get the following: [1] "0" "1" "(01)" "1" "2" "1" "2" "1" "0" "(01)" "0" "0" "0" "1" The (|) could be [|] or {|} and the number of digits between the brackets can be 2 or more. I've been trying to do this by separating on the brackets first: unlist(strsplit(x, "(?<=[\\]\\)\\}])", perl=T)) [1] "01(01)" "121210(01)" "0001" or unlist(strsplit(x, "(?<=[\\[\\(\\{])", perl=T)) [1]

How to count the factors in ordered sequence

百般思念 提交于 2019-12-23 10:26:31
问题 I have a dataframe df : userID Score Task_Alpha Task_Beta Task_Charlie Task_Delta 3108 -8.00 Easy Easy Easy Easy 3207 3.00 Hard Easy Match Match 3350 5.78 Hard Easy Hard Hard 3961 10.00 Easy NA Hard Hard 4021 10.00 Easy Easy NA Hard 1. userID is factor variable 2. Score is numeric 3. All the 'Task_' features are factor variables with possible values 'Hard', 'Easy', 'Match' or NA I want to count the possible transitions between the Task_ features. For reference, the possible transitions are:

Assigning results of strsplit to multiple columns of data frame

烂漫一生 提交于 2019-12-22 08:48:14
问题 I am trying to split a character vector into three different vectors, inside a data frame. My data is something like: > df <- data.frame(filename = c("Author1 (2010) Title of paper", "Author2 et al (2009) Title of paper", "Author3 & Author4 (2004) Title of paper"), stringsAsFactors = FALSE) And I would like to split those 3 informations ( authors , year , title ) into three different columns, so that it would be: > df filename author year title 1 Author1 (2010) Title1 Author1 2010 Title1 2

R: how to display the first n characters from a string of words

风格不统一 提交于 2019-12-22 04:59:50
问题 I have the following string: Getty <- "Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal." I want to display the first 10 characters. So I began by splitting the string into individual characters: split <- strsplit(Getty, split="") split I get all the individual characters as this point. Then I make a substring of the first 10 characters. first.10 <- substr(split, start

strsplit by row and distribute results by column in data.frame

ⅰ亾dé卋堺 提交于 2019-12-22 04:08:16
问题 So I have the data.frame dat = data.frame(x = c('Sir Lancelot the Brave', 'King Arthur', 'The Black Knight', 'The Rabbit'), stringsAsFactors=F) > dat x 1 Sir Lancelot the Brave 2 King Arthur 3 The Black Knight 4 The Rabbit And I want to transform it into the data frame > dat2 x 1 2 3 4 1 Sir Lancelot the Brave Sir Lancelot the Brave 2 King Arthur King Arthur 3 The Black Knight The Black Knight 4 The Rabbit The Rabbit strsplit returns the data as a list sbt <- strsplit(dat$x, " ") > sbt [[1]]

R: strsplit on backslash (\)

一曲冷凌霜 提交于 2019-12-21 16:22:42
问题 I am trying to extract the part of the string before the first backslash but I can't seem to get it tot work properly. I have tried multiple ways of getting it to work, based on the manual page for strsplit and after searching online. In my actual situation the strings are in a dataframe which I get from a database connection but I can simplify the situation with the following: > strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=TRUE) [[1]] [1] "BLAAT1\022E:" "BLAAT2" "BLAAT3" > strsplit(

strsplit inconsistent with gregexpr

我的未来我决定 提交于 2019-12-21 07:41:03
问题 A comment on my answer to this question which should give the desired result using strsplit does not, even though it seems to correctly match the first and last commas in a character vector. This can be proved using gregexpr and regmatches . So why does strsplit split on each comma in this example, even though regmatches only returns two matches for the same regex? # We would like to split on the first comma and # the last comma (positions 4 and 13 in this string) x <- "123,34,56,78,90" #