string count with overlapping occurrences

匿名 (未验证) 提交于 2019-12-03 01:31:01

问题:

What's the best way to count the number of occurrences of a given string, including overlap in python? is it the most obvious way:

def function(string, str_to_search_for):       count = 0       for x in xrange(len(string) - len(str_to_search_for) + 1):            if string[x:x+len(str_to_search_for)] == str_to_search_for:                 count += 1       return count   function('1011101111','11') returns 5

?

or is there a better way in python?

回答1:

Well, this might be faster since it does the comparing in C:

def occurrences(string, sub):     count = start = 0     while True:         start = string.find(sub, start) + 1         if start > 0:             count+=1         else:             return count


回答2:

>>> import re >>> text = '1011101111' >>> len(re.findall('(?=11)', text)) 5

If you didn't want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:

>>> sum(1 for _ in re.finditer('(?=11)', text)) 5

As a function (re.escape makes sure the substring doesn't interfere with the regex):

>>> def occurrences(text, sub):         return len(re.findall('(?={0})'.format(re.escape(sub)), text))  >>> occurrences(text, '11') 5


回答3:

You can also try using the new Python regex module, which supports overlapping matches.

import regex as re  def count_overlapping(text, search_for):     return len(re.findall(search_for, text, overlapped=True))  count_overlapping('1011101111','11')  # 5


回答4:

Python's str.count counts non-overlapping substrings:

In [3]: "ababa".count("aba") Out[3]: 1

Here are a few ways to count overlapping sequences, I'm sure there are many more :)

Look-ahead regular expressions

How to find overlapping matches with a regexp?

In [10]: re.findall("a(?=ba)", "ababa") Out[10]: ['a', 'a']

Generate all substrings

In [11]: data = "ababa" In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i)) Out[17]: 2


回答5:

s = "bobobob" sub = "bob" ln = len(sub) print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))


回答6:

My answer, to the bob question on the course:

s = 'azcbobobegghaklbob' total = 0 for i in range(len(s)-2):     if s[i:i+3] == 'bob':         total += 1 print 'number of times bob occurs is: ', total


回答7:

How to find a pattern in another string with overlapping

This function (another solution!) receive a pattern and a text. Returns a list with all the substring located in the and their positions.

def occurrences(pattern, text):     """     input: search a pattern (regular expression) in a text     returns: a list of substrings and their positions      """     p = re.compile('(?=({0}))'.format(pattern))     matches = re.finditer(p, text)     return [(match.group(1), match.start()) for match in matches]  print (occurrences('ana', 'banana')) print (occurrences('.ana', 'Banana-fana fo-fana'))

[('ana', 1), ('ana', 3)]
[('Bana', 0), ('nana', 2), ('fana', 7), ('fana', 15)]



回答8:

Here is my edX MIT "find bob"* solution (*find number of "bob" occurences in a string named s), which basicaly counts overlapping occurrences of a given substing:

s = 'azcbobobegghakl' count = 0  while 'bob' in s:     count += 1      s = s[(s.find('bob') + 2):]  print "Number of times bob occurs is: {}".format(count)


回答9:

That can be solved using regex.

import re def function(string, sub_string):     match = re.findall('(?='+sub_string+')',string)     return len(match)


回答10:

def count_overlaps (string, look_for):     start   = 0     matches = 0      while True:         start = string.find (look_for, start)         if start 


回答11:

Function that takes as input two strings and counts how many times sub occurs in string, including overlaps. To check whether sub is a substring, I used the in operator.

def count_Occurrences(string, sub):     count=0     for i in range(0, len(string)-len(sub)+1):         if sub in string[i:i+len(sub)]:             count=count+1     print 'Number of times sub occurs in string (including overlaps): ', count


回答12:

For a duplicated question i've decided to count it 3 by 3 and comparing the string e.g.

counted = 0  for i in range(len(string)):      if string[i*3:(i+1)*3] == 'xox':        counted = counted +1  print counted


回答13:

An alternative very close to the accepted answer but using while as the if test instead of including if inside the loop:

def countSubstr(string, sub):     count = 0     while sub in string:         count += 1         string = string[string.find(sub) + 1:]     return count;

This avoids while True: and is a little cleaner in my opinion



回答14:

If strings are large, you want to use Rabin-Karp, in summary:

  • a rolling window of substring size, moving over a string
  • a hash with O(1) overhead for adding and removing (i.e. move by 1 char)
  • implemented in C or relying on pypy


回答15:

def count_substring(string, sub_string):     counter = 0     for i in range(len(string)):         if string[i:].startswith(sub_string):         counter = counter + 1     return counter

Above code simply loops throughout the string once and keeps checking if any string is starting with the particular substring that is being counted.



回答16:

If you want to count permutation counts of length 5 (adjust if wanted for different lengths):

def MerCount(s):   for i in xrange(len(s)-4):     d[s[i:i+5]] += 1 return d


回答17:

sum([ 1 for _ in range(len(string)-len(str_to_search_for)+1) if string[_:_+len(str_to_search_for)] == str_to_search_for])

In a list comprehension, we slide through bigger string by one position at a time with the sliding window of length of smaller string. We can compute the sliding count by substracting the length of smaller string from bigger string. For each slide, we compare that part of bigger string with our smaller string and generate 1 in a list if match found. Sum of all of these 1's in a list will give us total number of matches found.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!