stringr | 易学教程

Splitting a text in one column into many columns for each row [duplicate]

阅读更多关于 Splitting a text in one column into many columns for each row [duplicate]

问题 This question already has answers here : Splitting a dataframe string column into multiple different columns [duplicate] (4 answers) Closed 10 months ago . I have the following dataset: Class Range Value A 6 - 8 19 B 1 - 3 14 C 5 - 16 10 D 4 - 7 5 I want to split the range for each class into two columns. To do that, I used the function str_split_fixed as the following: merge(data, str_split_fixed(data[, 2], " - ", 2)) and I even tried: merge(data, str_split_fixed(data$Range, " - ", 2)) But

Convert HTML Entity to proper character R

阅读更多关于 Convert HTML Entity to proper character R

Does anyone know of a generic function in r that can convert ä to its unicode character â ? I have seen some functions that take in â , and convert it to a normal character. Any help would be appreciated. Thanks. Edit: Below is a record of data, which I probably have over 1 million records. Is there an easier solution other than reading the data into a massive vector, and for each element, changing the records? wine/name: 1999 Domaine Robert Chevillon Nuits St. Georges 1er Cru Les Vaucrains wine/wineId: 43163 wine/variant: Pinot Noir wine/year: 1999 review/points: N/A review/time: 1337385600

Removing characters after a EURO symbol in R

阅读更多关于 Removing characters after a EURO symbol in R

I have a euro symbol saved in "euro" variable: euro <- "\u20AC" euro #[1] "€" And "eurosearch" variable contains "services as defined in this SOW at a price of € 15,896.80 (if executed fro" . eurosearch [1] "services as defined in this SOW at a price of € 15,896.80 (if executed fro" I want the characters after the Euro symbol which is "15,896.80 (if executed fro" I am using this code: gsub("^.*[euro]","",eurosearch) But I'm getting empty result. How can I obtain the expected output? Wiktor Stribiżew You can use variables in the pattern by just concatenating strings using paste0 : euro <- "€"

Counting whole word/number occurrences with str_count in R

阅读更多关于 Counting whole word/number occurrences with str_count in R

Similar to this case, i would like to count the number of occurrences of multiple words and numbers that occur in a vector of sentences with str_count of the stringr package. But I noticed that not only whole numbers are counted but also partial numbers. For example: df <- c("honda civic 1988 with new lights","toyota auris 4x4 140000 km","nissan skyline 2.0 159000 km") keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159") library(stringr) number_of_keywords_df <- str_count(df, paste(keywords, collapse='|')) Here I recieve a vector for number_of_keywords_df of 3

Substring extraction from vector in R

阅读更多关于 Substring extraction from vector in R

I am trying to extract substrings from a unstructured text. For example, assume a vector of country names: countries <- c("United States", "Israel", "Canada") How do I go about passing this vector of character values to extract exact matches from unstructured text. text.df <- data.frame(ID = c(1:5), text = c("United States is a match", "Not a match", "Not a match", "Israel is a match", "Canada is a match")) In this example, the desired output would be: ID text 1 United States 4 Israel 5 Canada So far I have been working with gsub by where I remove all non-matches and then eliminate then remove

Why does is this end of line (\\\\b) not recognised as word boundary in stringr/ICU and Perl

阅读更多关于 Why does is this end of line (\\\\b) not recognised as word boundary in stringr/ICU and Perl

Using stringr i tried to detect a € sign at the end of a string as follows: str_detect("my text €", "€\\b") # FALSE Why is this not working? It is working in the following cases: str_detect("my text a", "a\\b") # TRUE - letter instead of € grepl("€\\b", "2009in €") # TRUE - base R solution But it also fails in perl mode: grepl("€\\b", "2009in €", perl=TRUE) # FALSE So what is wrong about the €\\b -regex? The regex €$ is working in all cases... When you use base R regex functions without perl=TRUE , TRE regex flavor is used. It appears that TRE word boundary: When used after a non-word

An error I can't understand. “Promise already under evaluation…” [duplicate]

阅读更多关于 An error I can't understand. “Promise already under evaluation…” [duplicate]

This question already has an answer here: promise already under evaluation: recursive default argument reference or earlier problems? 2 answers I am trying to write a function that finds pattern in names, with the help of stringr package. My function looks like following: namezz=function(thepatternx,data=data,column=Name){ library(stringr) thepattern=as.character(quote(thepatternx)) pattern <- thepattern strings <- data$column ##data$column is a character vector found=str_detect(strings, pattern) yez= rownames(data[which(found==TRUE),]) hhh=as.numeric(yez)+1 return(hhh) } When I call the

Remove everything before the last space

阅读更多关于 Remove everything before the last space

I have a following string. I tried to remove all the strings before the last space but it seems I can't achieve it. I tried to follow this post Use gsub remove all string before first white space in R str <- c("Veni vidi vici") gsub("\\s*","\\1",str) "Venividivici" What I want to have is only "vici" string left after removing everything before the last space. Your gsub("\\s*","\\1",str) code replaces each occurrence of 0 or more whitespaces with a reference to the capturing group #1 value (which is an empty string since you have not specified any capturing group in the pattern). You want to

Remove everything before the last space

阅读更多关于 Remove everything before the last space

问题 I have a following string. I tried to remove all the strings before the last space but it seems I can't achieve it. I tried to follow this post Use gsub remove all string before first white space in R str <- c("Veni vidi vici") gsub("\\s*","\\1",str) "Venividivici" What I want to have is only "vici" string left after removing everything before the last space. 回答1: Your gsub("\\s*","\\1",str) code replaces each occurrence of 0 or more whitespaces with a reference to the capturing group #1

Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?

阅读更多关于 Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?

What are the differences between all of these functions that seem very similar ? stri_join , stri_c , and stri_paste come from package stringi and are pure aliases str_c comes from stringr and is just stringi::stri_join with a parameter ignore_null hardcoded to TRUE while stringi::stri_join has it set to FALSE by default. stringr::str_join is a deprecated alias for str_c see: library(stringi) identical(stri_join, stri_c) # [1] TRUE identical(stri_join, stri_paste) # [1] TRUE library(stringr) str_c # function (..., sep = "", collapse = NULL) # { # stri_c(..., sep = sep, collapse = collapse,