stringr

Splitting a text in one column into many columns for each row [duplicate]

别等时光非礼了梦想. 提交于 2019-12-02 03:41:18
问题 This question already has answers here : Splitting a dataframe string column into multiple different columns [duplicate] (4 answers) Closed 10 months ago . I have the following dataset: Class Range Value A 6 - 8 19 B 1 - 3 14 C 5 - 16 10 D 4 - 7 5 I want to split the range for each class into two columns. To do that, I used the function str_split_fixed as the following: merge(data, str_split_fixed(data[, 2], " - ", 2)) and I even tried: merge(data, str_split_fixed(data$Range, " - ", 2)) But

Convert HTML Entity to proper character R

拈花ヽ惹草 提交于 2019-12-02 03:06:28
Does anyone know of a generic function in r that can convert ä to its unicode character â ? I have seen some functions that take in â , and convert it to a normal character. Any help would be appreciated. Thanks. Edit: Below is a record of data, which I probably have over 1 million records. Is there an easier solution other than reading the data into a massive vector, and for each element, changing the records? wine/name: 1999 Domaine Robert Chevillon Nuits St. Georges 1er Cru Les Vaucrains wine/wineId: 43163 wine/variant: Pinot Noir wine/year: 1999 review/points: N/A review/time: 1337385600

Removing characters after a EURO symbol in R

烂漫一生 提交于 2019-12-02 01:58:27
I have a euro symbol saved in "euro" variable: euro <- "\u20AC" euro #[1] "€" And "eurosearch" variable contains "services as defined in this SOW at a price of € 15,896.80 (if executed fro" . eurosearch [1] "services as defined in this SOW at a price of € 15,896.80 (if executed fro" I want the characters after the Euro symbol which is "15,896.80 (if executed fro" I am using this code: gsub("^.*[euro]","",eurosearch) But I'm getting empty result. How can I obtain the expected output? Wiktor Stribiżew You can use variables in the pattern by just concatenating strings using paste0 : euro <- "€"

Counting whole word/number occurrences with str_count in R

浪子不回头ぞ 提交于 2019-12-02 01:26:19
Similar to this case, i would like to count the number of occurrences of multiple words and numbers that occur in a vector of sentences with str_count of the stringr package. But I noticed that not only whole numbers are counted but also partial numbers. For example: df <- c("honda civic 1988 with new lights","toyota auris 4x4 140000 km","nissan skyline 2.0 159000 km") keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159") library(stringr) number_of_keywords_df <- str_count(df, paste(keywords, collapse='|')) Here I recieve a vector for number_of_keywords_df of 3

Substring extraction from vector in R

别说谁变了你拦得住时间么 提交于 2019-12-01 22:59:02
I am trying to extract substrings from a unstructured text. For example, assume a vector of country names: countries <- c("United States", "Israel", "Canada") How do I go about passing this vector of character values to extract exact matches from unstructured text. text.df <- data.frame(ID = c(1:5), text = c("United States is a match", "Not a match", "Not a match", "Israel is a match", "Canada is a match")) In this example, the desired output would be: ID text 1 United States 4 Israel 5 Canada So far I have been working with gsub by where I remove all non-matches and then eliminate then remove

Why does is this end of line (\\\\b) not recognised as word boundary in stringr/ICU and Perl

◇◆丶佛笑我妖孽 提交于 2019-12-01 18:06:26
Using stringr i tried to detect a € sign at the end of a string as follows: str_detect("my text €", "€\\b") # FALSE Why is this not working? It is working in the following cases: str_detect("my text a", "a\\b") # TRUE - letter instead of € grepl("€\\b", "2009in €") # TRUE - base R solution But it also fails in perl mode: grepl("€\\b", "2009in €", perl=TRUE) # FALSE So what is wrong about the €\\b -regex? The regex €$ is working in all cases... When you use base R regex functions without perl=TRUE , TRE regex flavor is used. It appears that TRE word boundary: When used after a non-word

An error I can't understand. “Promise already under evaluation…” [duplicate]

血红的双手。 提交于 2019-12-01 13:25:40
This question already has an answer here: promise already under evaluation: recursive default argument reference or earlier problems? 2 answers I am trying to write a function that finds pattern in names, with the help of stringr package. My function looks like following: namezz=function(thepatternx,data=data,column=Name){ library(stringr) thepattern=as.character(quote(thepatternx)) pattern <- thepattern strings <- data$column ##data$column is a character vector found=str_detect(strings, pattern) yez= rownames(data[which(found==TRUE),]) hhh=as.numeric(yez)+1 return(hhh) } When I call the

Remove everything before the last space

痞子三分冷 提交于 2019-12-01 12:30:30
I have a following string. I tried to remove all the strings before the last space but it seems I can't achieve it. I tried to follow this post Use gsub remove all string before first white space in R str <- c("Veni vidi vici") gsub("\\s*","\\1",str) "Venividivici" What I want to have is only "vici" string left after removing everything before the last space. Your gsub("\\s*","\\1",str) code replaces each occurrence of 0 or more whitespaces with a reference to the capturing group #1 value (which is an empty string since you have not specified any capturing group in the pattern). You want to

Remove everything before the last space

蹲街弑〆低调 提交于 2019-12-01 11:30:28
问题 I have a following string. I tried to remove all the strings before the last space but it seems I can't achieve it. I tried to follow this post Use gsub remove all string before first white space in R str <- c("Veni vidi vici") gsub("\\s*","\\1",str) "Venividivici" What I want to have is only "vici" string left after removing everything before the last space. 回答1: Your gsub("\\s*","\\1",str) code replaces each occurrence of 0 or more whitespaces with a reference to the capturing group #1

Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?

北城余情 提交于 2019-12-01 11:23:00
What are the differences between all of these functions that seem very similar ? stri_join , stri_c , and stri_paste come from package stringi and are pure aliases str_c comes from stringr and is just stringi::stri_join with a parameter ignore_null hardcoded to TRUE while stringi::stri_join has it set to FALSE by default. stringr::str_join is a deprecated alias for str_c see: library(stringi) identical(stri_join, stri_c) # [1] TRUE identical(stri_join, stri_paste) # [1] TRUE library(stringr) str_c # function (..., sep = "", collapse = NULL) # { # stri_c(..., sep = sep, collapse = collapse,