gsub

how to remove words of specific length in a string in R?

守給你的承諾、 提交于 2019-12-01 03:44:11
问题 I want to remove words of length less than 3 in a string. for example my input is str<- c("hello RP have a nice day") I want my output to be str<- c("hello have nice day") Please help 回答1: Try this: gsub('\\b\\w{1,2}\\b','',str) [1] "hello have nice day" EDIT \b is word boundary. If need to drop extra space,change it as: gsub('\\b\\w{1,2}\\s','',str) Or gsub('(?<=\\s)(\\w{1,2}\\s)','',str,perl=T) 回答2: Or use str_extract_all to extract all words that have length >=3 and paste library(stringr)

gsub speed vs pattern length

时光怂恿深爱的人放手 提交于 2019-12-01 02:07:55
I've been using gsub extensively lately, and I noticed that short patterns run faster than long ones, which is not surprising. Here's a fully reproducible code: library(microbenchmark) set.seed(12345) n = 0 rpt = seq(20, 1461, 20) msecFF = numeric(length(rpt)) msecFT = numeric(length(rpt)) inp = rep("aaaaaaaaaa",15000) for (i in rpt) { n = n + 1 print(n) patt = paste(rep("a", rpt[n]), collapse = "") #time = microbenchmark(func(count[1:10000,12], patt, "b"), times = 10) timeFF = microbenchmark(gsub(patt, "b", inp, fixed=F), times = 10) msecFF[n] = mean(timeFF$time)/1000000. timeFT =

replace string in dataframe

爷,独闯天下 提交于 2019-11-30 20:56:24
I'm trying to replace a certain string in a large data.frame. I just found the following solution but gsub doesn't preserve the original data.frame layout. How can I achieve this. I mean I want to replace a string and don't want to change the layout of the df. Consider this example: test<-data.frame(a=c("a","b","c","d"),b=c("a","e","g","h"),c=c("i","j","k","a")) gsub("a","new",test) Thx You will want to lapply through your data.frame testing for character or factor entries and then applying gsub appropriately. The result will be a list , but as.data.frame fixes this. test$val <- 1:4 # a non

Modifying a character in a string in Lua

☆樱花仙子☆ 提交于 2019-11-30 20:32:18
Is there any way to replace a character at position N in a string in Lua. This is what I've come up with so far: function replace_char(pos, str, r) return str:sub(pos, pos - 1) .. r .. str:sub(pos + 1, str:len()) end str = replace_char(2, "aaaaaa", "X") print(str) I can't use gsub either as that would replace every capture, not just the capture at position N. RBerteig Strings in Lua are immutable. That means, that any solution that replaces text in a string must end up constructing a new string with the desired content. For the specific case of replacing a single character with some other

Regex issue in gsub

折月煮酒 提交于 2019-11-30 17:27:46
I have defined vec <- "5f 110y, Fast" and gsub("[\\s0-9a-z]+,", "", vec) gives " 5f Fast " I would have expected it to give " Fast " since everything before the comma should get matched by the regex. Can anyone explain to me why this is not the case? You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s , \d , \w . So, the regex in your case, "[\\s0-9a-z]+," , matches 1 or more \ , s , digits and lowercase ASCII letters, and then a single , . You may use POSIX character classes instead, like [:space:] (any whitespaces) or [:blank:] (horizontal whitespaces):

Remove punctuation but keeping emoticons?

不打扰是莪最后的温柔 提交于 2019-11-30 12:45:55
Is that possible to remove all the punctuations but keeping the emoticons such as :-( :) :D :p structure(list(text = structure(c(4L, 6L, 1L, 2L, 5L, 3L), .Label = c("ãããæããããéãããæãããInappropriate announce:-(", "@AirAsia your direct debit (Maybank) payment gateways is not working. Is it something you are working to fix?", "@AirAsia Apart from the slight delay and shortage of food on our way back from Phuket, both flights were very smooth. Kudos :)", "RT @AirAsia: ØØÙØÙÙÙÙ ÙØØØ ØØØÙ ÙØØØØÙ ØØØØÙÙÙí í Now you can enjoy a #great :D breakfast onboard with our new breakfast meals! :D", "xdek ke

Lua string.gsub with a hyphen

此生再无相见时 提交于 2019-11-30 08:37:39
问题 I have two strings - each string has many lines like the following: value_1 = "DEFAULT-VLAN" value_2 = "WAN" data = "HOSTNAME = DEFAULT-VLAN" result = string.gsub(data,value_1,value_2) print(result) Result: data = "HOSTNAME = DEFAULT-VLAN" When the hyphen ("-") is deleted from the value it is working. Is there an easy way to solve this? Thanks! 回答1: - is a magic character in Lua patterns. You need to escape it. Change value_1 = "DEFAULT-VLAN" to: value_1 = "DEFAULT%-VLAN" 回答2: This is because

Ruby regex- does gsub store what it matches?

醉酒当歌 提交于 2019-11-30 08:18:37
If i use .gsub(/matchthisregex/,"replace_with_this") does gsub store what it matches with the regex somewhere? I'd like to use what it matches in my replacement string. For example something like "replace_with_" + matchedregexstring + "this" in my above example where the matchedregexstring would be the stored match from gsub? Sorry if that was confusing, I don't know how else to word that. From the fine manual : If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \d , where d is a group number, or \k

Making gsub only replace entire words?

岁酱吖の 提交于 2019-11-30 08:17:55
问题 (I'm using R.) For a list of words that's called "goodwords.corpus", I am looping through the documents in a corpus, and replacing each of the words on the list "goodwords.corpus" with the word + a number. So for example if the word "good" is on the list, and "goodnight" is NOT on the list, then this document: I am having a good time goodnight would turn into: I am having a good 1234 time goodnight **I'm using this code (EDIT- made this reproducible): goodwords.corpus <- c("good") test <- "I

Remove special characters from data frame

房东的猫 提交于 2019-11-30 07:27:29
I have a matrix that contains the string "Energy per �m". Before the 'm' is a diamond shaped symbol with a question mark in it - I don't know what it is. I have tried to get rid of it by using this on the column of the matrix: a=gsub('Energy per �m','',a) [and using copy/paste for the first term of gsub], but it does not work.[unexpected symbol in "a=rep(5,Energy per"]. When I try to extract something from the original matrix with grepl I get: 46: In grepl("ref. value", raw$parameter) : input string 15318 is invalid in this locale How can I get rid of all this sort of signs? I would like to