string-matching

Pandas Compare two dataframes and determine the matched values

╄→гoц情女王★ 提交于 2019-12-08 03:47:33
问题 I have the following dataframes: print(dfa) ID Value AA12 101 BB101 CC01 DE06 1 AA11 102 BB101 CC01 234 EE07 2 AA10 202 BB101 CC01 345 EE09 3 AA13 103 BB101 CC02 123 4 AA14 203 BB101 CC02 456 5 AA15 204 BB102 CC03 567 6 print(dfb) ID Value AA10 202 BB101 CC01 EE09 345 3 AA11 102 BB101 CC01 EE07 234 2 AA12 101 BB101 CC01 DE06 1 AA13 103 BB101 CC02 123 4 AA18 203 BB103 CC01 456 5 AA15 204 BB201 CC11 678 7 I would like to compare the string in (dfa.ID, dfa.Value) to the one in (dfb.ID, dfb.Value

Regex Counting By 3s

ぐ巨炮叔叔 提交于 2019-12-07 16:55:47
问题 I'm teaching myself regular expressions, and found a quizzing site that has been helping me find more applications for them and has been helping me expand my knowledge of how they work. I found a question asking me to form a regex to match 10 digit numbers that are multiples of 3s. The only way I can think of doing this is by having the regex recognise numbers' values and be able to manipulate them mathematically. How is this possible? In other words, what regex would match 0003 0006 0351

Fastest way to find Strings in String collection that begin with certain chars

强颜欢笑 提交于 2019-12-07 15:55:24
问题 I have a large collection of Strings. I want to be able to find the Strings that begin with "Foo" or the Strings that end with "Bar". What would be the best Collection type to get the fastest results? (I am using Java) I know that a HashSet is very fast for complete matches, but not for partial matches I would think? So, what could I use instead of just looping through a List? Should I look into LinkedList's or similar types? Are there any Collection Types that are optimized for this kind of

How to search for a part of a dictionary key?

妖精的绣舞 提交于 2019-12-07 12:02:38
问题 Could someone please tell me, how I can search for only a part of a key in a dictionary (in VB.NET)? I use the following sample code: Dim PriceList As New Dictionary(Of String, Double)(System.StringComparer.OrdinalIgnoreCase) PriceList.Add("Spaghetti alla carbonara", 21.65) PriceList.Add("Spaghetti aglio e olio", 22.65) PriceList.Add("Spaghetti alla napoletana", 23.65) PriceList.Add("Spaghetti alla puttanesca ", 24.65) PriceList.Add("Spaghetti alla gricia ", 25.65) PriceList.Add("Spaghetti

Longest Common Substring with wrong character tolerance

纵然是瞬间 提交于 2019-12-07 11:38:55
问题 I have a script I found on here that works well when looking for the Lowest Common Substring. However, I need it to tolerate some incorrect/missing characters. I would like be able to either input a percentage of similarity required, or perhaps specify the number of missing/wrong characters allowable. For example, I want to find this string: big yellow school bus inside of this string: they rode the bigyellow schook bus that afternoon This is the code i'm currently using: function longest

How can I generate a list of words from a group of letters using Perl?

百般思念 提交于 2019-12-07 10:54:19
问题 I was looking for a module, regex, or anything else that might apply to this problem. How can I programatically parse the string and create known English &| Spanish words given that I have a dictionary table against which I can check each permutation of the algorithm's randomization for a match? Given a group of characters: EBLAIDL KDIOIDSI ADHFWB The program should return: BLADE AID KID KIDS FIDDLE HOLA etc.... I also want to be able to define the minimum & maximum word length as well as the

String regex two mismatches Python

丶灬走出姿态 提交于 2019-12-07 06:47:44
问题 How can I extend the code below to allow me to explore all instances where I have 2 mismatches or less between my substring and the parent string? Substring: SSQP String-to-match-to: SSPQQQQPSSSSQQQSSQPSPSQSSQPSSQPPSSSSQPSPSQSSQPSSSSQPSPSQSSQPSSSSQPSPSQ Here is an example where only one possible mismatch is incorporated: >>> s = 'SSPQQQQPSSSSQQQSSQPSPSQSSQPSSQPPSSSSQPSPSQSSQPSSSSQPSPSQSSQPSSSSQPSPSQ' >>> re.findall(r'(?=(SSQP|[A-Z]SQP|S[A-Z]QP|SS[A-Z]P|SSQ[A-Z]))', s) ['SSQQ', 'SSQP', 'SSQP',

Extract last word in a string after comma if there are multiple words else the first word

狂风中的少年 提交于 2019-12-07 02:27:02
问题 I have data where the words as follows location<- c("xyz, sss, New Zealand", "USA", "Pris,France") id<- c(1,2,3) df<-data.frame(location,id) I would like to extract the country name from the data. The tricky part is if i extract just the last word then I will have only one record (France). library(stringr) df$country<- word(df$location,-1) Any ideas on how to extract country data from this data? id location country 1 xyz, sss, New Zealand New Zealand 2 USA USA 3 Pris,France France 回答1: You

R - Merging two data files based on partial matching of inconsistent full name formats

落爺英雄遲暮 提交于 2019-12-07 01:58:29
Here is my previous question reposted with R format. I'm looking for a way to merge two data files based on partial matching of participants' full names that are sometimes entered in different formats and sometimes misspelled. I know there are some different function options for partial matches (eg agrep and pmatch) and for merging data files but I need help with a) combining the two; b) doing partial matching that can ignore middle names; c) in the merged data file store both original name formats and d) retain unique values even if they don't have a match. For example, I have the following

Compare files line by line to see if they are the same, if so output them

一世执手 提交于 2019-12-06 15:58:30
How would I go about this, I have files which I have sorted the information in, I want to compare a certain index in that file with an index in another, one problem is that the files are enormously large, millions of lines. I want to compare line by line the files I have, if they match I want to input both those values along with other values using an index method. ======================= Let me clarify, I want to take say line[x] the x will remain the same as it is formatted uniformly, I want to run line[x] against line[y] in another file, I want to do this to the whole file and output every