string-matching

How to get domain from a string using javascript regular expression

强颜欢笑 提交于 2019-11-29 23:25:22
问题 As the title suggests, I'm trying to retrieve the domain from a string using javascript regular expression. Take the following strings: String ==> Return "google" ==> null "google.com" ==> "google.com" "www.google.com" ==> "www.google.com" "ftp://ftp.google.com" ==> "ftp.google.com" "http://www.google.com" ==> "www.google.com" "http://www.google.com/" ==> "www.google.com" "https://www.google.com/" ==> "www.google.com" "https://www.google.com.sg/" ==> "www.google.com.sg" "https://www.google

strstr faster than algorithms?

限于喜欢 提交于 2019-11-29 22:30:42
I have a file that's 21056 bytes. I've written a program in C that reads the entire file into a buffer, and then uses multiple search algorithms to search the file for a token that's 82 chars. I've used all the implementations of the algorithms from the “Exact String Matching Algorithms” page. I've used: KMP, BM, TBM, and Horspool. And then I used strstr and benchmarked each one. What I'm wondering is, each time the strstr outperforms all the other algorithms. The only one that is faster sometimes is BM. Shouldn't strstr be the slowest? Here's my benchmark code with an example of benchmarking

String searching algorithms in Java

a 夏天 提交于 2019-11-29 15:45:59
问题 I am doing string matching with big amount of data. EDIT: I am matching words contained in a big list with some ontology text files. I take each file from ontology, and search for a match between the third String of each file line and any word from the list. I made a mistake in overseeing the fact that what I need to do is not pure matching (results are poor), but I need some looser matching function that will also return results when the string is contained inside another string. I did this

how to check if a word appears as a whole word in a string in Lua

谁说胖子不能爱 提交于 2019-11-29 11:16:31
not sure how to check if a word appears as a whole word in a string, not part of a word, case sensitive. for example: Play is in strings Info Playlist Play pause but not in the strings Info Playlist pause Info NowPlay pause Since there is no usual \b word boundary in Lua, you can make use of a frontier pattern %f . %f[%a] matches a transition to a letter and %f[%A] matches the opposite transition. %f[set] , a frontier pattern ; such item matches an empty string at any position such that the next character belongs to set and the previous character does not belong to set. The set set is

Returning the lowest index for the first non whitespace character in a string in Python

删除回忆录丶 提交于 2019-11-29 09:52:45
What's the shortest way to do this in Python? string = " xyz" must return index = 3 >>> s = " xyz" >>> len(s) - len(s.lstrip()) 3 >>> next(i for i, j in enumerate(' xyz') if j.strip()) 3 or >>> next(i for i, j in enumerate(' xyz') if j not in string.whitespace) 3 in versions of Python < 2.5 you'll have to do: (...).next() Looks like the "regexes can do anything" brigade have taken the day off, so I'll fill in: >>> tests = [u'foo', u' foo', u'\xA0foo'] >>> import re >>> for test in tests: ... print len(re.match(r"\s*", test, re.UNICODE).group(0)) ... 0 1 1 >>> FWIW: time taken is O(the_answer),

Remove ends of string entries in pandas DataFrame column

这一生的挚爱 提交于 2019-11-29 07:50:39
I have a pandas Dataframe with one column a list of files import pandas as pd df = pd.read_csv('fname.csv') df.head() filename A B C fn1.txt 2 4 5 fn2.txt 1 2 1 fn3.txt .... .... I would like to delete the file extension .txt from each entry in filename . How do I accomplish this? I tried: df['filename'] = df['filename'].map(lambda x: str(x)[:-4]) but when I look at the column entries afterwards with df.head() , nothing has changed. How does one do this? I think you can use str.replace with regex .txt$' ( $ - matches the end of the string ): import pandas as pd df = pd.DataFrame({'A': {0: 2, 1

How to search for a string in one column in other columns of a data frame

半城伤御伤魂 提交于 2019-11-29 04:32:37
I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them 'exists_in_title' and 'exists_in_description') that have either a 1 or 0 indicating if the 3rd column exists in either the 1st or 2nd column. I need it to simply be a 1:1 operation, so for example, calling row 1 'A', I need to check if the cell A3, exists in A1, and use that data to create column exists_in_title, and then check if A3 exists in A2,

Algorithm to find out whether the matches for two Glob patterns (or Regular Expressions) intersect

流过昼夜 提交于 2019-11-29 00:29:58
问题 I'm looking at matching glob-style patterns similar the what the Redis KEYS command accepts. Quoting: h?llo matches hello, hallo and hxllo h*llo matches hllo and heeeello h[ae]llo matches hello and hallo, but not hillo But I am not matching against a text string, but matching the pattern against another pattern with all operators being meaningful on both ends. For example these patterns should match against each other in the same row: prefix* prefix:extended* *suffix *:extended:suffix left

Fast partial string matching in R

醉酒当歌 提交于 2019-11-29 00:29:22
问题 Given a vector of strings texts and a vector of patterns patterns , I want to find any matching pattern for each text. For small datasets, this can be easily done in R with grepl : patterns = c("some","pattern","a","horse") texts = c("this is a text with some pattern", "this is another text with a pattern") # for each x in patterns lapply( patterns, function(x){ # match all texts against pattern x res = grepl( x, texts, fixed=TRUE ) print(res) # do something with the matches # ... }) This

Python Fuzzy Matching (FuzzyWuzzy) - Keep only Best Match

房东的猫 提交于 2019-11-29 00:06:50
I'm trying to fuzzy match two csv files, each containing one column of names, that are similar but not the same. My code so far is as follows: import pandas as pd from pandas import DataFrame from fuzzywuzzy import process import csv save_file = open('fuzzy_match_results.csv', 'w') writer = csv.writer(save_file, lineterminator = '\n') def parse_csv(path): with open(path,'r') as f: reader = csv.reader(f, delimiter=',') for row in reader: yield row if __name__ == "__main__": ## Create lookup dictionary by parsing the products csv data = {} for row in parse_csv('names_1.csv'): data[row[0]] = row