可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
What's the best way to count the number of occurrences of a given string, including overlap in python? is it the most obvious way:
def function(string, str_to_search_for): count = 0 for x in xrange(len(string) - len(str_to_search_for) + 1): if string[x:x+len(str_to_search_for)] == str_to_search_for: count += 1 return count function('1011101111','11') returns 5
?
or is there a better way in python?
回答1:
Well, this might be faster since it does the comparing in C:
def occurrences(string, sub): count = start = 0 while True: start = string.find(sub, start) + 1 if start > 0: count+=1 else: return count
回答2:
>>> import re >>> text = '1011101111' >>> len(re.findall('(?=11)', text)) 5
If you didn't want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:
>>> sum(1 for _ in re.finditer('(?=11)', text)) 5
As a function (re.escape
makes sure the substring doesn't interfere with the regex):
>>> def occurrences(text, sub): return len(re.findall('(?={0})'.format(re.escape(sub)), text)) >>> occurrences(text, '11') 5
回答3:
You can also try using the new Python regex module, which supports overlapping matches.
import regex as re def count_overlapping(text, search_for): return len(re.findall(search_for, text, overlapped=True)) count_overlapping('1011101111','11') # 5
回答4:
Python's str.count
counts non-overlapping substrings:
In [3]: "ababa".count("aba") Out[3]: 1
Here are a few ways to count overlapping sequences, I'm sure there are many more :)
Look-ahead regular expressions
How to find overlapping matches with a regexp?
In [10]: re.findall("a(?=ba)", "ababa") Out[10]: ['a', 'a']
Generate all substrings
In [11]: data = "ababa" In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i)) Out[17]: 2
回答5:
s = "bobobob" sub = "bob" ln = len(sub) print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))
回答6:
My answer, to the bob question on the course:
s = 'azcbobobegghaklbob' total = 0 for i in range(len(s)-2): if s[i:i+3] == 'bob': total += 1 print 'number of times bob occurs is: ', total
回答7:
How to find a pattern in another string with overlapping
This function (another solution!) receive a pattern and a text. Returns a list with all the substring located in the and their positions.
def occurrences(pattern, text): """ input: search a pattern (regular expression) in a text returns: a list of substrings and their positions """ p = re.compile('(?=({0}))'.format(pattern)) matches = re.finditer(p, text) return [(match.group(1), match.start()) for match in matches] print (occurrences('ana', 'banana')) print (occurrences('.ana', 'Banana-fana fo-fana'))
[('ana', 1), ('ana', 3)]
[('Bana', 0), ('nana', 2), ('fana', 7), ('fana', 15)]
回答8:
Here is my edX MIT "find bob"* solution (*find number of "bob" occurences in a string named s), which basicaly counts overlapping occurrences of a given substing:
s = 'azcbobobegghakl' count = 0 while 'bob' in s: count += 1 s = s[(s.find('bob') + 2):] print "Number of times bob occurs is: {}".format(count)
回答9:
That can be solved using regex.
import re def function(string, sub_string): match = re.findall('(?='+sub_string+')',string) return len(match)
回答10:
def count_overlaps (string, look_for): start = 0 matches = 0 while True: start = string.find (look_for, start) if start
回答11:
Function that takes as input two strings and counts how many times sub occurs in string, including overlaps. To check whether sub is a substring, I used the in
operator.
def count_Occurrences(string, sub): count=0 for i in range(0, len(string)-len(sub)+1): if sub in string[i:i+len(sub)]: count=count+1 print 'Number of times sub occurs in string (including overlaps): ', count
回答12:
For a duplicated question i've decided to count it 3 by 3 and comparing the string e.g.
counted = 0 for i in range(len(string)): if string[i*3:(i+1)*3] == 'xox': counted = counted +1 print counted
回答13:
An alternative very close to the accepted answer but using while
as the if
test instead of including if
inside the loop:
def countSubstr(string, sub): count = 0 while sub in string: count += 1 string = string[string.find(sub) + 1:] return count;
This avoids while True:
and is a little cleaner in my opinion
回答14:
If strings are large, you want to use Rabin-Karp, in summary:
- a rolling window of substring size, moving over a string
- a hash with O(1) overhead for adding and removing (i.e. move by 1 char)
- implemented in C or relying on pypy
回答15:
def count_substring(string, sub_string): counter = 0 for i in range(len(string)): if string[i:].startswith(sub_string): counter = counter + 1 return counter
Above code simply loops throughout the string once and keeps checking if any string is starting with the particular substring that is being counted.
回答16:
If you want to count permutation counts of length 5 (adjust if wanted for different lengths):
def MerCount(s): for i in xrange(len(s)-4): d[s[i:i+5]] += 1 return d
回答17:
sum([ 1 for _ in range(len(string)-len(str_to_search_for)+1) if string[_:_+len(str_to_search_for)] == str_to_search_for])
In a list comprehension, we slide through bigger string by one position at a time with the sliding window of length of smaller string. We can compute the sliding count by substracting the length of smaller string from bigger string. For each slide, we compare that part of bigger string with our smaller string and generate 1 in a list if match found. Sum of all of these 1's in a list will give us total number of matches found.