string-matching

How can I implement wildcard at ismember function of matlab?

情到浓时终转凉″ 提交于 2019-12-04 22:05:25
How can I do the implementation doing this in matlab; ismember(file_names,['*.mp4']) I would do that with regexp , like this: result = ~cellfun(@isempty,(regexp(file_names,'\.mp4$'))); For example, file_names = {'aaa.mp4','bbb.mp3'}; gives result = 1 0 Using regular expressions (regexp) This can be easily achieved with regexp : tf = ~cellfun('isempty', regexp(file_names, '.*\.mp4')); If you want to force the pattern matching to the beginning or the end of the filename, you should add a caret ( ^ ) or a dollar sign ( $ ) respectively, for instance: %// Match pattern at the beginning of the

How to improve PHP string match with similar_text()?

妖精的绣舞 提交于 2019-12-04 18:07:48
I am using PHP's similar_text() call to compare two strings, however, I am not getting good enough results, for example, the best I'm getting is 80.95% for a match that I'd like to see 100% on. What other functions can I use to get the strings down to the core? <!-- Overcast, Rain or Showers compared Overcast, Rain or Showers is 80.9523809524 --> <!-- Overcast, Risk of Rain or Showers compared Overcast, Rain or Showers is 86.2068965517 --> <!-- Overcast, Chance of Rain or Showers compared Overcast, Rain or Showers is 83.3333333333 --> Levenshtein distance: http://php.net/manual/en/function

Hashing n-grams by cyclic polynomials - java implementation

南笙酒味 提交于 2019-12-04 18:04:54
I'm solving some problem that involves Rabin–Karp string search algorithm. This algorithm requires rolling hash to be faster then naive search. This article describes how to implement rolling hash. I implemented "Rabin-Karp rolling hash" without problems and found few implementations implementations , but article also mentions computational complexity and that hashing n-grams by cyclic polynomials is prefered. It links to BuzHash implementation of such technique but I wonder how it can be used to build n-gram hash on top of it. I want to have something like this or CPHash cp = new CPHash(

Damerau–Levenshtein distance (Edit Distance with Transposition) c implementation

偶尔善良 提交于 2019-12-04 18:01:49
问题 I implemented the Damerau–Levenshtein distance in c++ but it does not give correct o/p for the input (pantera,aorta) the correct o/p is 4 but my code gives 5..... int editdist(string s,string t,int n,int m) { int d1,d2,d3,cost; int i,j; for(i=0;i<=n;i++) { for(j=0;j<=m;j++) { if(s[i+1]==t[j+1]) cost=0; else cost=1; d1=d[i][j+1]+1; d2=d[i+1][j]+1; d3=d[i][j]+cost; d[i+1][j+1]=minimum(d1,d2,d3); if(i>0 && j>0 && s[i+1]==t[j] && s[i]==t[j+1] ) //transposition { d[i+1][j+1]=min(d[i+1][j+1],d[i-1]

Searching one Python dataframe / dictionary for fuzzy matches in another dataframe

ぐ巨炮叔叔 提交于 2019-12-04 17:01:46
I have the following pandas dataframe with 50,000 unique rows and 20 columns (included is a snippet of the relevant columns): df1 : PRODUCT_ID PRODUCT_DESCRIPTION 0 165985858958 "Fish Burger with Lettuce" 1 185965653252 "Chicken Salad with Dressing" 2 165958565556 "Pork and Honey Rissoles" 3 655262522233 "Cheese, Ham and Tomato Sandwich" 4 857485966653 "Coleslaw with Yoghurt Dressing" 5 524156285551 "Lemon and Raspberry Cheesecake" I also have the following dataframe (which I also have saved in dictionary form) which has 2 columns and 20,000 unique rows: df2 (also saved as dict_2) PROD_ID PROD

How to check if matching text is found in a string in Lua?

若如初见. 提交于 2019-12-04 14:59:36
问题 I need to make a conditional that is true if a particular matching text is found at least once in a string of text, e.g.: str = "This is some text containing the word tiger." if string.match(str, "tiger") then print ("The word tiger was found.") else print ("The word tiger was not found.") How can I check if the text is found somewhere in the string? 回答1: You can use either of string.match or string.find . I personally use string.find() myself. Also, you need to specify end of your if-else

Efficient way to check if a given string is equivalent to at least one string in the given set of strings

我们两清 提交于 2019-12-04 11:18:06
Given a set of strings, say "String1", "String2",..., "StringN" , what is the most efficient way in C++ to determine (return true or false ) whether given string s matches any of the strings in the above set? Can Boost.Regex be used for this task? std::unordered_set would provide the most efficient look-up (amortized constant time). #include <unordered_set> #include <string> #include <cassert> int main() { std::unordered_set<std::string> s = {"Hello", "Goodbye", "Good morning"}; assert(s.find("Goodbye") != s.end()); assert(s.find("Good afternoon") == s.end()); return 0; } You can put all your

Speeding up a “closest” string match algorithm

时光怂恿深爱的人放手 提交于 2019-12-04 08:34:25
I am currently processing a very large database of locations and trying to match them with their real world coordinates. To achieve this, I have downloaded the geoname dataset which contains a lot of entries. It gives possible names and lat/long coordinates. To try and speed up the process, I have managed to reduce the huge csv file (of 1.6 GB) to 0.450 GB by removing entries that do not make sense for my dataset. It still contains however 4 million entries. Now I have many entries such as: Slettmarkmountains seen from my camp site in Jotunheimen, Norway, last week Adventuring in Fairy Glen,

bash script to check file name begins with expected string

↘锁芯ラ 提交于 2019-12-04 07:52:53
Running on OS X with a bash script: sourceFile=`basename $1` shopt -s nocasematch if [[ "$sourceFile" =~ "adUsers.txt" ]]; then echo success ; else echo fail ; fi The above works, but what if the user sources a file called adUsers_new.txt ? I tried: if [[ "$sourceFile" =~ "adUsers*.txt" ]]; then echo success ; else echo fail ; fi But the wildcard doesn't work in this case. I'm writing this script to allow for the user to have different iterations of the source file name, which must begin with aduser and have the .txt file extension. In bash , you can get the first 7 characters of a shell

How to search a string of key/value pairs in Java

社会主义新天地 提交于 2019-12-04 06:39:00
I have a String that's formatted like this: "key1=value1;key2=value2;key3=value3" for any number of key/value pairs. I need to check that a certain key exists (let's say it's called "specialkey"). If it does, I want the value associated with it. If there are multiple "specialkey"s set, I only want the first one. Right now, I'm looking for the index of "specialkey". I take a substring starting at that index, then look for the index of the first = character. Then I look for the index of the first ; character. The substring between those two indices gives me the value associated with "specialkey"