问题
So, this is the challenge:
I have two tables:
Etalon:
+-----+-----+-----+-----+----+
| e1 | e2 | e3 | e4 | e5 |
+-----+-----+-----+-----+----+
| 01 | 02 | 03 | 04 | 05 |
+-----+-----+-----+-----+----+
And Candidates:
+-----+----+-----+-----+-----+----+----+
| ID | c1 | c2 | c3 | c4 | c5 | nn |
+-----+----+-----+-----+-----+----+----+
| 00 | 03 | 08 | 02 | 01 | 06 | ** |
+-----+----+-----+-----+-----+----+----+
| 01 | 05 | 04 | 03 | 02 | 01 | ** |
+-----+----+-----+-----+-----+----+----+
| 02 | 06 | 07 | 08 | 09 | 10 | ** |
+-----+----+-----+-----+-----+----+----+
| 03 | 08 | 06 | 09 | 02 | 07 | ** |
+-----+----+-----+-----+-----+----+----+
What request should I use, to find and save (in nn column) the number of matches between two rows (e1, e2, e3, e4, e5 and c1, c2, c3, c4, c5) for each row in table candidate?
Should be the next result:
Candidates:
|-----|----|-----|-----|-----|-----|----|
| ID | c1 | c2 | c3 | c4 | c5 | nn |
|-----|----|-----|-----|-----|-----|----|
| 00 | 03 | 08 | 02 | 01 | 06 | 03 |
|-----|----|-----|-----|-----|-----|----|
| 01 | 05 | 04 | 03 | 02 | 01 | 05 |
|-----|----|-----|-----|-----|-----|----|
| 02 | 06 | 07 | 08 | 09 | 10 | 00 |
|-----|----|-----|-----|-----|-----|----|
| 03 | 08 | 06 | 09 | 02 | 07 | 01 |
|-----|----|-----|-----|-----|-----|----|
The result for nn is:
0 - no matches
1,2,3,4,5 - numbers of matches
How can I achieve that?
回答1:
The objective is to establish a maximal partial matching between the master row and each row of the client table without regard to the respective column identities.
The idea is to abstract away from the column ids by representing the column contents in another way. As you indicated that the value domain is {1, ..., 10}
, one may choose the first 10 prime numbers {p_1, ...,p_10} = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 }
, mapping i
to p_i
. The comparisons will be based on the product of the mapped column values. This approach exploits the uniqueness of prime factorization, ie. every positive integer factorizes into a unique multi-set of prime numbers.
A one-pass standalone sql update statement is rather cumbersome to write down, therefore we create a temporary table that contains the products of the mapped values:
CREATE TEMPORARY TABLE t_pp (
id NUMBER
, mp_candidates NUMBER
, mp_etalon NUMBER
, nn NUMBER
);
INSERT INTO t_pp ( id, mp_candidates, mp_etalon )
SELECT id
, CASE c1
WHEN 1 THEN 2
WHEN 2 THEN 3
WHEN 3 THEN 5
WHEN 4 THEN 7
WHEN 5 THEN 11
WHEN 6 THEN 13
WHEN 7 THEN 17
WHEN 8 THEN 19
WHEN 9 THEN 23
WHEN 10 THEN 29
ELSE 31
END
* CASE c2 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE c3 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE c4 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE c5 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
mp_candidates
, CASE e1
WHEN 1 THEN 2
WHEN 2 THEN 3
WHEN 3 THEN 5
WHEN 4 THEN 7
WHEN 5 THEN 11
WHEN 6 THEN 13
WHEN 7 THEN 17
WHEN 8 THEN 19
WHEN 9 THEN 23
WHEN 10 THEN 29
ELSE 31
END
* CASE e2 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE e3 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE e4 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
* CASE e5 WHEN 2 THEN 3 WHEN 3 THEN 5 WHEN 4 THEN 7 WHEN 5 THEN 11 WHEN 6 THEN 13 WHEN 7 THEN 17 WHEN 8 THEN 19 WHEN 9 THEN 23 WHEN 10 THEN 29 ELSE 31 END
mp_etalon
, 0 nn
FROM candidates
CROSS JOIN etalon
;
Now for pass #2 - counting matches:
UPDATE t_pp
SET nn =
CASE WHEN mp_candidates MOD 2 = 0 AND mp_etalon MOD 2 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 3 = 0 AND mp_etalon MOD 3 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 5 = 0 AND mp_etalon MOD 5 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 7 = 0 AND mp_etalon MOD 7 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 11 = 0 AND mp_etalon MOD 11 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 13 = 0 AND mp_etalon MOD 13 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 17 = 0 AND mp_etalon MOD 17 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 19 = 0 AND mp_etalon MOD 19 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 23 = 0 AND mp_etalon MOD 23 = 0 THEN 1 ELSE 0 END
+ CASE WHEN mp_candidates MOD 29 = 0 AND mp_etalon MOD 29 = 0 THEN 1 ELSE 0 END
;
Finally, transferring the results to the original table and cleaning up:
UPDATE candidates c
set nn = ( SELECT p.nn FROM t_pp p WHERE p.id = c.id )
;
DELETE TEMPORARY TABLE t_pp;
Some more notes:
- The scheme as shown assumes that cell values are unique within each row. However, it can easily be extended to allow formultiple occurrences of values.
- In principle, this can be wrapped in a single sql statement - for obvious reasons this is not recommended.
- Rdbms other than mysql follow the sql standard and provide the
WITH
clause that obviates the need for a temporaray table. - The value
31
in theELSE
branch of the aboveCASE
expressions is a dummy value.
来源:https://stackoverflow.com/questions/28380180/the-number-of-matches-between-two-rows-mysql