string-matching

Fuzzy string matching in r

跟風遠走 提交于 2019-12-24 03:54:25
问题 I have 2 datasets with more than 100K rows each. I would like to merge them based on fuzzy string matching one column('movie title') as well as using release date. I am providing a sample from both datasets below. dataset-1 itemid userid rating time title release_date 99991 1673 835 3 1998-03-27 mirage 1995 99992 1674 840 4 1998-03-29 mamma roma 1962 99993 1675 851 3 1998-01-08 sunchaser, the 1996 99994 1676 851 2 1997-10-01 war at home, the 1996 99995 1677 854 3 1997-12-22 sweet nothing 1995

Aho-Corasick text matching on whole words?

笑着哭i 提交于 2019-12-24 00:58:20
问题 I'm using Aho-Corasick text matching and wonder if it could be altered to match terms instead of characters. In other words, I want the the terms to be the basis of matching rather than the characters. As an example: Search query: "He", Sentence: "Hello world", Aho-Corasick will match "he" to the sentence "hello world" ending at index 2, but I would prefer to have no match. So, I mean by "terms" words rather than characters. 回答1: One way to do this would be to use Aho-Corasick as usual, then

String matching techniques

倾然丶 夕夏残阳落幕 提交于 2019-12-24 00:37:07
问题 The following strings are considered equal. How can I match stuff like this? "Hazard Const. Company" "hazard construction company" "PETERSON-CHASE GENERAL ENGINEERING CONSTRUCTION INC" "peterson-chase general engineering construction inc" "TRAFFIC DEVELOPMENT SERVICES " "traffic development services" My environment is ruby, but I'm just wondering general principles to match strings. The above examples don't work w/ rudimentary "a"=="b" because of whitespace issues, and abbreviations. I can

Uncaught TypeError: Cannot read property 'toUpperCase' of undefined

南楼画角 提交于 2019-12-23 22:01:26
问题 I am trying to convert two strings to same format like toUpperCase/toLowerCase to compare two strings regardless of case sensitive in javaScript. Below is my function. function submitForm() { var usernames=['one','two','Test']; var cpusername = "test"; var flag = 0; if (cpusername !== "") { for (var k = 0; k < usernames.length; k++) { var upperCasecpusername=cpusername.toUpperCase(); var getusername= usernames[k]; var upperCaseusername=getusername.toUpperCase(); if (upperCasecpusername ===

String manipulation: How to replace a string with a specific pattern

白昼怎懂夜的黑 提交于 2019-12-23 14:04:31
问题 I've a question here related to string manipulation based on a specific pattern. I am trying to replace a specific pattern with a pre-defined pattern using C# For eg: Scenario # 1 Input: substringof('xxxx', [Property2]) Output: [Property2].Contains('xxxx') where this string could be used within linq's Where clause. My sol: var key= myString.Substring(myString.Split(',')[0].Length + 1, myString.Length - myString.Split(',')[0].Length - 2); var value = myString.Replace("," + key, "").Replace(

String manipulation: How to replace a string with a specific pattern

纵然是瞬间 提交于 2019-12-23 14:03:59
问题 I've a question here related to string manipulation based on a specific pattern. I am trying to replace a specific pattern with a pre-defined pattern using C# For eg: Scenario # 1 Input: substringof('xxxx', [Property2]) Output: [Property2].Contains('xxxx') where this string could be used within linq's Where clause. My sol: var key= myString.Substring(myString.Split(',')[0].Length + 1, myString.Length - myString.Split(',')[0].Length - 2); var value = myString.Replace("," + key, "").Replace(

Remove part of the string in json document using str replace for many records

只愿长相守 提交于 2019-12-23 12:16:39
问题 I would like to replace a string in this file which is causing the invalid json arguments. I can manually delete the first string "_id" : ObjectId( "539163d7bd350003" ), and can convert this json to a data frame. Is there a way I can replace all the instances of json file with function like str_replace. I tried the following but couldn't make it work. Any suggestions? library(RJSONIO) library(stringr) json_file<- '{ "_id" : ObjectId( "539163d7bd350003" ), "login" : "vui", "id" : 369607,

how to differentiate two very long strings in c++?

北城余情 提交于 2019-12-23 05:00:22
问题 I would like to solve Levenshtein_distance this problem where length of string is too huge . Edit2 : As Bobah said that title is miss leading , so i had updated the title of questoin . Initial title was how to declare 100000x100000 2-d integer in c++ ? Content was There is any way to declare int x[100000][100000] in c++. When i declare it globally then compiler produces error: size of array ‘x’ is too large . One method could be using map< pair< int , int > , int > mymap . But allocating and

PHP preg_replace replace text unless inside brackets

只谈情不闲聊 提交于 2019-12-23 02:04:37
问题 I would like to use PHP's preg_replace() to search a text for occurrences of a certain word, and enclose that word in brackets, unless there are already brackets present. The challenge here is that I want to test for brackets that may or may not be directly adjacent to the text I am looking for. Random example: I want to replace warfarin with [[warfarin]] in this string: Use warfarin for the prevention of strokes but not in this string: Use [[warfarin]] for the prevention of strokes (brackets

“Partial match” table (aka “failure function”) in KMP (on wikipedia)

我怕爱的太早我们不能终老 提交于 2019-12-22 18:41:19
问题 I'm reading the KMP algorithm on wikipedia. There is one line of code in the "Description of pseudocode for the table-building algorithm" section that confuses me: let cnd ← T[cnd] It has a comment: (second case: it doesn't, but we can fall back) , I know we can fall back, but why T[cnd], is there a reason? Because it really confuses me. Here is the complete pseudocode fot the table-building algorithm: algorithm kmp_table: input: an array of characters, W (the word to be analyzed) an array of