Fastest way to Find a m x n submatrix in M X N matrix

前端 未结 4 1472
囚心锁ツ
囚心锁ツ 2020-12-10 02:02

I was thinking of a fast method to look for a submatrix m in a bigger mtrix M. I also need to identify partial matches.

Couple of approaches I could think of are :

相关标签:
4条回答
  • 2020-12-10 02:34

    I think you cannot just guess where the submatrix is with some approach, but you can optimize your searching.

    For example, given a matrix A MxN and a submatrix B mxn, you can do like:

    SearchSubMatrix (Matrix A, Matrix B)
    
    answer = (-1, -1)
    
    Loop1:
    for i = 0 ... (M-m-1)
    |
    |   for j = 0 ... (N-n-1)
    |   | 
    |   |   bool found = true
    |   |
    |   |   if A[i][j] = B[0][0] then
    |   |   |
    |   |   |   Loop2:
    |   |   |   for r = 0 ... (m-1)
    |   |   |   |   for s = 0 ... (n-1)
    |   |   |   |   |   if B[r][s] != A[r+i][s+j] then
    |   |   |   |   |   |   found = false
    |   |   |   |   |   |   break Loop2
    |   |
    |   |   if found then
    |   |   |   answer = (i, j)
    |   |   |   break Loop1
    |
    return answer
    

    Doing this, you will reduce your search in the reason of the size of the submatrix.

    Matrix         Submatrix         Worst Case:
    1 2 3 4           2 4            [1][2][3] 4
    4 3 2 1           3 2            [4][3][2] 1
    1 3 2 4                          [1][3]{2  4}
    4 1 3 2                           4  1 {3  2}
    
                                     (M-m+1)(N-n+1) = (4-2+1)(4-2+1) = 9
    

    Although this is O(M*N), it will never look M*N times, unless your submatrix has only 1 dimension.

    0 讨论(0)
  • 2020-12-10 02:38

    There is no way to do this fast if you only ever need to match one small matrix against one big matrix. But if you need to do many small matrices against big matrices, then preprocess the big matrix.

    A simple example, exact match, many 3x3 matrices against one giant matrix.

    Make a new "match matrix", same size as "big matrix", For each location in big matrix compute a 3x3 hash for each x,y to x+3,y+3 in big matrix. Now you just scan the match matrix for matching hashes.

    You can achieve partial matches with specialized hash functions that give the same hash to things that have the same partial matching properties. Tricky.

    If you want to speed up further and have memory for it, create a hash table for the match matrix, and lookup the hashes in the hash table.

    The 3x3 solution will work for any test matrix 3x3 or larger. You don't need to have a perfect hash method - you need just something that will reject the majority of bad matches, and then do a full match for potential matches in the hash table.

    0 讨论(0)
  • 2020-12-10 02:43

    I recommend doing an internet search on "2d pattern matching algorithms". You'll get plenty of results. I'll just link the first hit on Google, a paper that presents an algorithm for your problem.

    You can also take a look at the citations at the end of the paper to get an idea of other existing algorithms.

    The abstract:

    An algorithm for searching for a two dimensional m x m pattern in a two dimensional n x n text is presented. It performs on the average less comparisons than the size of the text: n^2/m using m^2 extra space. Basically, it uses multiple string matching on only n/m rows of the text. It runs in at most 2n^2 time and is close to the optimal n^2 time for many patterns. It steadily extends to an alphabet-independent algorithm with a similar worst case. Experimental results are included for a practical version.

    0 讨论(0)
  • 2020-12-10 02:47

    There are very fast algorithms for this if you are willing to preprocess the matrix and if you have many queries for the same matrix.

    Have a look at the papers on Algebraic Databases by the Research group on Multimedia Databases (Prof. Clausen, University of Bonn). Have a look at this paper for example: http://www-mmdb.iai.uni-bonn.de/download/publications/sigir-03.pdf

    The basic idea is to generalize inverted list, so they use any kind of algebraic transformation, instead of just shifts in one direction as with ordinary inverted lists.

    This means that this approach works whenever the modifications you need to do to the input data can be modelled algebraically. This specifically that queries which are translated in any number of dimensions, rotated, flipped etc can all be retrieved.

    The paper is mainly showing this for musical data, since this is their main research interest, but you might be able to find others, which show how to adapt this to image data as well (or you can try to adapt it yourself, if you understand the principle it's quite simple).

    Edit:

    This idea also works with partial matches, if you define them correctly.

    0 讨论(0)
提交回复
热议问题