string-matching | 易学教程

Use Java Regex to find multiple matching words in a sentence

阅读更多关于 Use Java Regex to find multiple matching words in a sentence

问题 I have a sentence, and a set of words say; Mayweather, undefeated … etc. I want to: check if the sentence contains any of the above mentioned words… (I want it to look for matching words only, basically ignore full-stops, commas and new lines.) and if it does, I want to display few words before and after each matching word, maybe by using String.format() Here’s my code which seems to be working OK but not exactly how I want it: String sentence = "Floyd Mayweather Jr is an American

Alternative to vlookup with exact and approximate match doesnt work

阅读更多关于 Alternative to vlookup with exact and approximate match doesnt work

问题 Cell A1: 0553400710 Cell A2: John Cell B1: ['0553400710', '0553439406'] Note: List item Cell B1 has a fixed format of ['number','number,'number',...... ] A1 and A2 are user input values I want to match 0553400710 in Cell A1 with ['0553400710', '0553439406'] in Cell B1. If it matches, I want to return A2: John. Is it possible? Vlookup failed to work by the way. I am looking for some technique which uses the advantage of fixed format Picture 1: This is the formula i have tried Picture 2: This

Lua - How to find a substring with 1 or 2 characters discrepancy

阅读更多关于 Lua - How to find a substring with 1 or 2 characters discrepancy

问题 Say I have a string local a = "Hello universe" I find the substring "universe" by a:find("universe") Now, suppose the string is local a = "un#verse" The string to be searched is universe; but the substring differs by a single character. So obviously Lua ignores it. How do I make the function find the string even if there is a discrepancy by a single character? 回答1: If you know where the character would be, use . instead of that character: a:find("un.verse") However, it looks like you're

Count the maximum of consecutive letters in a string

阅读更多关于 Count the maximum of consecutive letters in a string

问题 I have this vector: vector <- c("XXXX-X-X", "---X-X-X", "--X---XX", "--X-X--X", "-X---XX-", "-X--X--X", "X-----XX", "X----X-X", "X---XX--", "XX--X---", "---X-XXX", "--X-XX-X") I want to detect the maximum of consecutive times that appears X. So, my expected vector would be: 4, 1, 2, 1,2, 1, 2, 1, 2, 2, 3, 2 回答1: In base R, we can split each vector into separate characters and then using rle find the max consecutive length for "X". sapply(strsplit(vector, ""), function(x) { inds = rle(x) max

TCL string match vs regexps

阅读更多关于 TCL string match vs regexps

问题 Is it right that we should avoid using regexp as it is slow. Instead we should use string operations. Are there cases that both can be used but regexp is better? 回答1: You should use the appropriate tool for the job. That means, you should not avoid regex, you should use it when it is necessary. If you are just searching for a fixed sequence of characters, use string operations. If you are searching for a pattern , then use regular expressions. Example Search for the word "Foo". use string

difflib on Ruby [closed]

阅读更多关于 difflib on Ruby [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . Is there a library similar to Python's difflib on Ruby? Particularly, I need one that has a method similar to difflib.get_close_matches. Any recommendations? 回答1: After some research, I suggest using amatch or SimMetrics (with JRuby) and manually implement the get_close_matches method. Both libs offer

Efficient string suffix detection

阅读更多关于 Efficient string suffix detection

问题 I am working with PySpark on a huge dataset, where I want to filter the data frame based on strings in another data frame. For example, dd = spark.createDataFrame(["something.google.com","something.google.com.somethingelse.ac.uk","something.good.com.cy", "something.good.com.cy.mal.org"], StringType()).toDF('domains') +----------------------------------------+ |domains | +----------------------------------------+ |something.google.com | |something.google.com.somethingelse.ac.uk| |something

Understanding the Knuth Morris Pratt(KMP) Failure Function

阅读更多关于 Understanding the Knuth Morris Pratt(KMP) Failure Function

问题 I've been reading the Wikipedia article about the Knuth-Morris-Pratt algorithm and I'm confused about how the values are found in the jump/partial match table. i | 0 1 2 3 4 5 6 W[i] | A B C D A B D T[i] | -1 0 0 0 0 1 2 If someone can more clearly explain the shortcut rule because the sentence "let us say that we discovered a proper suffix which is a proper prefix and ending at W[2] with length 2 (the maximum possible)" is confusing. If the proper suffix ends at W[2] wouldn't it be size of 3

How to compare and convert emoji characters in C#

阅读更多关于 How to compare and convert emoji characters in C#

问题 I am trying to figure out how to check if a string contains a specfic emoji. For example, look at the following two emoji: Bicyclist: http://unicode.org/emoji/charts/full-emoji-list.html#1f6b4 US Flag: http://unicode.org/emoji/charts/full-emoji-list.html#1f1fa_1f1f8 Bicyclist is U+1F6B4 , and the US flag is U+1F1FA U+1F1F8 . However, the emoji to check for are provided to me in an array like this, with just the numerical value in strings: var checkFor = new string[] {"1F6B4","1F1FA-1F1F8"};

How can I match words regardless of tense or form?

阅读更多关于 How can I match words regardless of tense or form?

问题 I am currently working on a script that runs through a document, pulls out all keywords, and then attempts to match these keywords with those found in other documents. There are some specifics that complicate this, but they are not very pertinent to me question. Basically I would like to be able to match words regardless of the tense in which they appear. For example: If given the strings "swim", "swam", and "swimming", I would like a program that can recognize that these are all the same