The number of matches between two rows mysql

纵然是瞬间 提交于 2019-12-23 06:47:23

问题


So, this is the challenge:

I have two tables:

Etalon:

+-----+-----+-----+-----+----+
|  e1 |  e2 |  e3 |  e4 | e5 |
+-----+-----+-----+-----+----+
|  01 |  02 |  03 |  04 | 05 |
+-----+-----+-----+-----+----+

And Candidates:

+-----+----+-----+-----+-----+----+----+
| ID  | c1 | c2  | c3  | c4  | c5 | nn |
+-----+----+-----+-----+-----+----+----+
| 00  | 03 | 08  | 02  | 01  | 06 | ** |
+-----+----+-----+-----+-----+----+----+
| 01  | 05 | 04  | 03  | 02  | 01 | ** |
+-----+----+-----+-----+-----+----+----+
| 02  | 06 | 07  | 08  | 09  | 10 | ** |
+-----+----+-----+-----+-----+----+----+
| 03  | 08 | 06  | 09  | 02  | 07 | ** |
+-----+----+-----+-----+-----+----+----+

What request should I use, to find and save (in nn column) the number of matches between two rows (e1, e2, e3, e4, e5 and c1, c2, c3, c4, c5) for each row in table candidate?

Should be the next result:

Candidates:

|-----|----|-----|-----|-----|-----|----|
| ID  | c1 | c2  | c3  | c4  | c5  | nn |
|-----|----|-----|-----|-----|-----|----|
| 00  | 03 | 08  | 02  | 01  | 06  | 03 |
|-----|----|-----|-----|-----|-----|----|
| 01  | 05 | 04  | 03  | 02  | 01  | 05 |
|-----|----|-----|-----|-----|-----|----|
| 02  | 06 | 07  | 08  | 09  | 10  | 00 |
|-----|----|-----|-----|-----|-----|----|
| 03  | 08 | 06  | 09  | 02  | 07  | 01 |
|-----|----|-----|-----|-----|-----|----|

The result for nn is:

0 - no matches
1,2,3,4,5 - numbers of matches 

How can I achieve that?


回答1:


The objective is to establish a maximal partial matching between the master row and each row of the client table without regard to the respective column identities.

The idea is to abstract away from the column ids by representing the column contents in another way. As you indicated that the value domain is {1, ..., 10}, one may choose the first 10 prime numbers {p_1, ...,p_10} = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 }, mapping i to p_i. The comparisons will be based on the product of the mapped column values. This approach exploits the uniqueness of prime factorization, ie. every positive integer factorizes into a unique multi-set of prime numbers.

A one-pass standalone sql update statement is rather cumbersome to write down, therefore we create a temporary table that contains the products of the mapped values:

CREATE TEMPORARY TABLE t_pp (
      id            NUMBER
    , mp_candidates NUMBER
    , mp_etalon     NUMBER
    , nn            NUMBER
);
INSERT INTO t_pp ( id, mp_candidates, mp_etalon )
     SELECT id
          ,   CASE c1
                  WHEN  1 THEN  2
                  WHEN  2 THEN  3
                  WHEN  3 THEN  5
                  WHEN  4 THEN  7
                  WHEN  5 THEN 11
                  WHEN  6 THEN 13
                  WHEN  7 THEN 17
                  WHEN  8 THEN 19
                  WHEN  9 THEN 23
                  WHEN 10 THEN 29
                  ELSE         31
              END
            * CASE c2 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE c3 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE c4 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE c5 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
                mp_candidates

          ,   CASE e1
                  WHEN  1 THEN  2
                  WHEN  2 THEN  3
                  WHEN  3 THEN  5
                  WHEN  4 THEN  7
                  WHEN  5 THEN 11
                  WHEN  6 THEN 13
                  WHEN  7 THEN 17
                  WHEN  8 THEN 19
                  WHEN  9 THEN 23
                  WHEN 10 THEN 29
                  ELSE         31
              END
            * CASE e2 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE e3 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE e4 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
            * CASE e5 WHEN  2 THEN  3 WHEN  3 THEN  5 WHEN  4 THEN  7 WHEN  5 THEN 11 WHEN  6 THEN 13 WHEN  7 THEN 17 WHEN  8 THEN 19 WHEN  9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
                mp_etalon
          , 0   nn
       FROM candidates
 CROSS JOIN etalon     
          ;

Now for pass #2 - counting matches:

UPDATE t_pp
   SET nn =
             CASE WHEN mp_candidates MOD  2 = 0 AND mp_etalon MOD  2 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD  3 = 0 AND mp_etalon MOD  3 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD  5 = 0 AND mp_etalon MOD  5 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD  7 = 0 AND mp_etalon MOD  7 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 11 = 0 AND mp_etalon MOD 11 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 13 = 0 AND mp_etalon MOD 13 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 17 = 0 AND mp_etalon MOD 17 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 19 = 0 AND mp_etalon MOD 19 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 23 = 0 AND mp_etalon MOD 23 = 0  THEN 1 ELSE 0 END
           + CASE WHEN mp_candidates MOD 29 = 0 AND mp_etalon MOD 29 = 0  THEN 1 ELSE 0 END
     ;

Finally, transferring the results to the original table and cleaning up:

UPDATE candidates c
   set nn = ( SELECT p.nn FROM t_pp p WHERE p.id = c.id )
     ;
DELETE TEMPORARY TABLE t_pp;

Some more notes:

  • The scheme as shown assumes that cell values are unique within each row. However, it can easily be extended to allow formultiple occurrences of values.
  • In principle, this can be wrapped in a single sql statement - for obvious reasons this is not recommended.
  • Rdbms other than mysql follow the sql standard and provide the WITH clause that obviates the need for a temporaray table.
  • The value 31 in the ELSE branch of the above CASE expressions is a dummy value.


来源:https://stackoverflow.com/questions/28380180/the-number-of-matches-between-two-rows-mysql

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!