Python finding patterns within large group of numbers? [duplicate]

问题

I'm working with a list of lists that have the periods of continued fractions for non-perfect square roots in each of them.

What I'm trying to do with them is to check the size of the largest repeating pattern in each list.

Some of the lists for example:

[
 [1,1,1,1,1,1....],
 [4,1,4,1,4,1....],
 [1,2,10,1,2,10....],
 [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8....],
 [2,2,2,4,2,2,2,4....],
 [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15....],
 [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25....]
]

The two similar methods that I've been working with are:

def lengths(seq):
    for i in range(len(seq),1,-1):
        if seq[0:i] == seq[i:i*2]:
            return i


def lengths(seq):
    for i in range(1,len(seq)-1):
        if seq[0:i] == seq[i:i*2]:
            return i

These both take the size of the lists and compare indexed sizes of it from the current position. The problem is first one returns wrong for just one repeating digit because it starts big and see's just the one large pattern. The problem with the second is that there are nested patterns like the sixth and seventh example list and it will be satisfied with the nested loop and overlook the rest of the pattern.

回答1:

Works (caught a typo in 4th element of your sample)

>>> seq_l = [
...  [1,1,1,1,1,1],
...  [4,1,4,1,4,1],
...  [1,2,10,1,2,10],
...  [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8],
...  [2,2,2,4,2,2,2,4,2,2,2,4,2,2,2,4],
...  [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15],
...  [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25]
... ]
>>> 
>>> def rep_len(seq):
...     s_len = len(seq)
...     for i in range(1,s_len-1):
...         if s_len%i == 0:
...             j = s_len/i
...             if seq == j*seq[:i]:
...                 return i
...                 
... 
>>> [rep_len(seq) for seq in seq_l]
[1, 2, 3, 12, 4, 18, 12]

回答2:

If it's not unfeasible to convert your lists to strings, using regular expressions would make this a trivial task.

import re

lists = [
    [1,1,1,1,1,1],
    [4,1,4,1,4,1],
    [1,2,10,1,2,10],
    [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8], #I think you had a typo in this one...
    [2,2,2,4,2,2,2,4],
    [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15],
    [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25]
]

for l in lists:
    s = "x".join(str(i) for i in l)
    print s
    match = re.match(r"^(?P<foo>.*)x?(?P=foo)", s)
    if match:
        print match.group('foo')
    else:
        print "****"
    print

(?P<foo>.*) creates a group known as "foo" and (?P=foo) matches that. Since regular expressions are greedy, you get the longest match by default. The "x?" just allows for a single x in the middle to handle even/odd lengths.

回答3:

You probably could do a collections.defaultdict(int) to keep counts of All the sublists, unless you know there are some sublists you don't care about. Convert the sublists to tuples before making them dictionary keys.

You might be able to get somewhere using a series of bloom filters though, if space is tight. You'd have one bloom filter for subsequences of length 1, another for subsequences of length 2, etc. Then the largest bloom filter that gets a collision has your maximum length sublist.

http://stromberg.dnsalias.org/~strombrg/drs-bloom-filter/

回答4:

I think you just have to check two levels of sequences at once.0..i == i..i*2 and 0..i/2 != i/2..i.

def lengths(seq):
    for i in range(len(seq),1,-1):
        if seq[0:i] == seq[i:i*2] and seq[0:i/2] != seq[i/2:i]:
            return i

If the two halves of 0..i are equal then it means that you are actually comparing two concatenated patterns with each other.

回答5:

Starting with the first example method, you could recursively search the sub pattern.

def lengths(seq):
    for i in range(len(seq)-1,1,-1):
        if seq[0:i] == seq[i:i*2]:
            j = lengths(seq[0:i]) # Search pattern for sub pattern
            if j < i and i % j == 0: # Found a smaller pattern; further, a longer repeated
                # pattern length must be a multiple of the shorter pattern length
                n = i/j # Number of pattern repetitions (might change to // if using Py3K)
                for k in range(1, n): # Check that all the smaller patterns are the same
                    if seq[0:j] != seq[j*n:j*(n+1)]: # Stop when we find a mismatch
                        return i # Not a repetition of smaller pattern
                else: return j # All the sub-patterns are the same, return the smaller length
            else: return i # No smaller pattern

I get the feeling this solution isn't quite correct, but I'll do some testing and edit it as necessary. (Quick note: Shouldn't the initial for loop start at len(seq)-1? If not, you compare seq[0:len] to seq[len:len], which seems silly, and would cause the recursion to loop infinitely.)

Edit: Seems sorta similar to the top answer in the related question senderle posted, so you'd best just go read that. ;)

来源：https://stackoverflow.com/questions/11403905/python-finding-patterns-within-large-group-of-numbers

标签

python

pattern-matching