string-matching

Match and replace emoticons in string - what is the most efficient way?

狂风中的少年 提交于 2019-12-17 20:55:41
问题 Wikipedia defines a lot of possible emoticons people can use. I want to match this list to words in a string. I now have this: $string = "Lorem ipsum :-) dolor :-| samet"; $emoticons = array( '[HAPPY]' => array(' :-) ', ' :) ', ' :o) '), //etc... '[SAD]' => array(' :-( ', ' :( ', ' :-| ') ); foreach ($emoticons as $emotion => $icons) { $string = str_replace($icons, " $emotion ", $string); } echo $string; Output: Lorem ipsum [HAPPY] dolor [SAD] samet so in principle this works. However, I have

Using grep to subset rows from a data.table, comparing row content

佐手、 提交于 2019-12-17 16:39:39
问题 DT <- data.table(num=c("20031111","1112003","23423","2222004"),y=c("2003","2003","2003","2004")) > DT num y 1: 20031111 2003 2: 1112003 2003 3: 23423 2003 4: 2222004 2004 I want to compare the two cell content, and perform an action based on the boolean value. for instance, if "num" matches the year, create a column x holding that value. I thought about subsetting based on grep, and that works, but naturally checks the whole column every time which seems wasteful DT[grep(y,num)] # works with

Search for string allowing for one mismatch in any location of the string

送分小仙女□ 提交于 2019-12-17 10:37:41
问题 I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasite). I am not sure how large the genome is, but much longer than 230,000 sequences. I need to look for each of my sequences of 25 characters, for example, (AGCCTCCCATGATTGAACAGATCAT). The genome is formatted as a continuous string, i.e.

Regular Expression Match to test for a valid year

五迷三道 提交于 2019-12-17 08:54:44
问题 Given a value I want to validate it to check if it is a valid year. My criteria is simple where the value should be an integer with 4 characters. I know this is not the best solution as it will not allow years before 1000 and will allow years such as 5000 . This criteria is adequate for my current scenario. What I came up with is \d{4}$ While this works it also allows negative values. How do I ensure that only positive integers are allowed? 回答1: You need to add a start anchor ^ as: ^\d{4}$

agrep: only return best match(es)

狂风中的少年 提交于 2019-12-17 06:38:19
问题 I'm using the 'agrep' function in R, which returns a vector of matches. I would like a function similar to agrep that only returns the best match, or best matches if there are ties. Currently, I am doing this using the 'sdist()' function from the package 'cba' on each element of the resulting vector, but this seems very redundant. /edit: here is the function I'm currently using. I'd like to speed it up, as it seems redundant to calculate distance twice. library(cba) word <- 'test' words <- c(

How to search a specific value in all tables (PostgreSQL)?

╄→尐↘猪︶ㄣ 提交于 2019-12-16 23:53:09
问题 Is it possible to search every column of every table for a particular value in PostgreSQL? A similar question is available here for Oracle. 回答1: How about dumping the contents of the database, then using grep ? $ pg_dump --data-only --inserts -U postgres your-db-name > a.tmp $ grep United a.tmp INSERT INTO countries VALUES ('US', 'United States'); INSERT INTO countries VALUES ('GB', 'United Kingdom'); The same utility, pg_dump, can include column names in the output. Just change --inserts to

Match string not containg a certain phrase

丶灬走出姿态 提交于 2019-12-14 03:19:28
问题 I need to find all instances of the word "confidential" in a message except when it is used in the phrase "confidential and proprietary" in which case it is ok and I dont need to pick it up through regex. Thanks all in advance! -P 回答1: Using word boundaries \b is also an option here. \bconfidential\b(?! and proprietary\b) 回答2: You can use negative lookaround (http://www.regular-expressions.info/lookaround.html) This regex will match: (confidential) (?!and proprietary) if your engine support

Greasemonkey: look for a string between two elements [closed]

社会主义新天地 提交于 2019-12-13 10:57:57
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . I'm writing a Greasemonkey script, which should do a simple thing: if there's a certain string between the <h2> element (always only one on the page) and the first occurrence of an <h3> element (can be several of

How to search string with all possible combination in java?

青春壹個敷衍的年華 提交于 2019-12-13 09:28:21
问题 How to implement string matching with all possible combination of given key in Java just like Android studio. does? Any regex pattern available. 回答1: You do not need a regex for this, because a greedy algorithm will do. You can match a string against a pattern in O(n+p), where n is the length of string and p is the length of pattern, by following a very simple strategy: for each character of the pattern, look for a matching character in the string starting at the current index. If you find a

How to implement string matching algorithm with Hadoop?

跟風遠走 提交于 2019-12-13 09:10:19
问题 I want to implement a string matching(Boyer-Moore) algorithm using Hadoop. I just started using Hadoop so I have no idea how to write a Hadoop program in Java. All the sample programs that I have seen so far are word counting examples and I couldn't find any sample programs for string matching. I tried searching for some tutorials that teaches how to write Hadoop applications using Java but couldn't find any. Can you suggest me some tutorials where I can learn how to write Hadoop applications