string-matching | 易学教程

Pandas Compare two dataframes and determine the matched values

阅读更多关于 Pandas Compare two dataframes and determine the matched values

问题 I have the following dataframes: print(dfa) ID Value AA12 101 BB101 CC01 DE06 1 AA11 102 BB101 CC01 234 EE07 2 AA10 202 BB101 CC01 345 EE09 3 AA13 103 BB101 CC02 123 4 AA14 203 BB101 CC02 456 5 AA15 204 BB102 CC03 567 6 print(dfb) ID Value AA10 202 BB101 CC01 EE09 345 3 AA11 102 BB101 CC01 EE07 234 2 AA12 101 BB101 CC01 DE06 1 AA13 103 BB101 CC02 123 4 AA18 203 BB103 CC01 456 5 AA15 204 BB201 CC11 678 7 I would like to compare the string in (dfa.ID, dfa.Value) to the one in (dfb.ID, dfb.Value

Regex Counting By 3s

阅读更多关于 Regex Counting By 3s

问题 I'm teaching myself regular expressions, and found a quizzing site that has been helping me find more applications for them and has been helping me expand my knowledge of how they work. I found a question asking me to form a regex to match 10 digit numbers that are multiples of 3s. The only way I can think of doing this is by having the regex recognise numbers' values and be able to manipulate them mathematically. How is this possible? In other words, what regex would match 0003 0006 0351

Fastest way to find Strings in String collection that begin with certain chars

阅读更多关于 Fastest way to find Strings in String collection that begin with certain chars

问题 I have a large collection of Strings. I want to be able to find the Strings that begin with "Foo" or the Strings that end with "Bar". What would be the best Collection type to get the fastest results? (I am using Java) I know that a HashSet is very fast for complete matches, but not for partial matches I would think? So, what could I use instead of just looping through a List? Should I look into LinkedList's or similar types? Are there any Collection Types that are optimized for this kind of

How to search for a part of a dictionary key?

阅读更多关于 How to search for a part of a dictionary key?

问题 Could someone please tell me, how I can search for only a part of a key in a dictionary (in VB.NET)? I use the following sample code: Dim PriceList As New Dictionary(Of String, Double)(System.StringComparer.OrdinalIgnoreCase) PriceList.Add("Spaghetti alla carbonara", 21.65) PriceList.Add("Spaghetti aglio e olio", 22.65) PriceList.Add("Spaghetti alla napoletana", 23.65) PriceList.Add("Spaghetti alla puttanesca ", 24.65) PriceList.Add("Spaghetti alla gricia ", 25.65) PriceList.Add("Spaghetti

Longest Common Substring with wrong character tolerance

阅读更多关于 Longest Common Substring with wrong character tolerance

问题 I have a script I found on here that works well when looking for the Lowest Common Substring. However, I need it to tolerate some incorrect/missing characters. I would like be able to either input a percentage of similarity required, or perhaps specify the number of missing/wrong characters allowable. For example, I want to find this string: big yellow school bus inside of this string: they rode the bigyellow schook bus that afternoon This is the code i'm currently using: function longest

How can I generate a list of words from a group of letters using Perl?

阅读更多关于 How can I generate a list of words from a group of letters using Perl?

问题 I was looking for a module, regex, or anything else that might apply to this problem. How can I programatically parse the string and create known English &| Spanish words given that I have a dictionary table against which I can check each permutation of the algorithm's randomization for a match? Given a group of characters: EBLAIDL KDIOIDSI ADHFWB The program should return: BLADE AID KID KIDS FIDDLE HOLA etc.... I also want to be able to define the minimum & maximum word length as well as the

String regex two mismatches Python

阅读更多关于 String regex two mismatches Python

问题 How can I extend the code below to allow me to explore all instances where I have 2 mismatches or less between my substring and the parent string? Substring: SSQP String-to-match-to: SSPQQQQPSSSSQQQSSQPSPSQSSQPSSQPPSSSSQPSPSQSSQPSSSSQPSPSQSSQPSSSSQPSPSQ Here is an example where only one possible mismatch is incorporated: >>> s = 'SSPQQQQPSSSSQQQSSQPSPSQSSQPSSQPPSSSSQPSPSQSSQPSSSSQPSPSQSSQPSSSSQPSPSQ' >>> re.findall(r'(?=(SSQP|[A-Z]SQP|S[A-Z]QP|SS[A-Z]P|SSQ[A-Z]))', s) ['SSQQ', 'SSQP', 'SSQP',

Extract last word in a string after comma if there are multiple words else the first word

阅读更多关于 Extract last word in a string after comma if there are multiple words else the first word

问题 I have data where the words as follows location<- c("xyz, sss, New Zealand", "USA", "Pris,France") id<- c(1,2,3) df<-data.frame(location,id) I would like to extract the country name from the data. The tricky part is if i extract just the last word then I will have only one record (France). library(stringr) df$country<- word(df$location,-1) Any ideas on how to extract country data from this data? id location country 1 xyz, sss, New Zealand New Zealand 2 USA USA 3 Pris,France France 回答1: You

R - Merging two data files based on partial matching of inconsistent full name formats

阅读更多关于 R - Merging two data files based on partial matching of inconsistent full name formats

Here is my previous question reposted with R format. I'm looking for a way to merge two data files based on partial matching of participants' full names that are sometimes entered in different formats and sometimes misspelled. I know there are some different function options for partial matches (eg agrep and pmatch) and for merging data files but I need help with a) combining the two; b) doing partial matching that can ignore middle names; c) in the merged data file store both original name formats and d) retain unique values even if they don't have a match. For example, I have the following

Compare files line by line to see if they are the same, if so output them

阅读更多关于 Compare files line by line to see if they are the same, if so output them

How would I go about this, I have files which I have sorted the information in, I want to compare a certain index in that file with an index in another, one problem is that the files are enormously large, millions of lines. I want to compare line by line the files I have, if they match I want to input both those values along with other values using an index method. ======================= Let me clarify, I want to take say line[x] the x will remain the same as it is formatted uniformly, I want to run line[x] against line[y] in another file, I want to do this to the whole file and output every