问题
Order by higher percentage matching checking 30 columns mysql
I would like to make a profile matching project. The target is to return say 100 results matching better percentage first. the scenery is -
A user has yes or no answer of 30 questions(all answered).
User is interested to see 100 people who has matching with him order by higher percentage
I need suggestion to decide how I will make the table and query ensuring minimum processing load -
Should I store answers in separate columns (value is yes/no in each column) or in same column separated by comma (only yes answers educated,tall,rich,single,caring)?
What should be the query for Table A and Table B to return highest matching order by percentage.
Here is the Table (answers for 30 fixed questions, yes/no type answer)
.id | name | q01 | q02 | q03 | q04 | q05 | q06 |...continue...| q30
11 .|. tom ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1
12 .|. mik ..|.. 0 ..|.. 0 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|..... ............. |. 0
13 .|. jim ...|.. 1 ..|.. 1 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1
14 .|. don ..|.. 0 ..|.. 1 ..|.. 1 ...|.. 0 ..|.. 0 ..|.. 0 ..|..... ............. |. 1
15 .|. ric ....|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 0
16 .|. jam ..|.. 0 ..|.. 1 ..|.. 0 ...|.. 0 ..|.. 0 ..|.. 0 ..|..... ............. |. 1
17 .|. joe ...|.. 1 ..|.. 1 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|..... ............. |. 1
18 .|. ima ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1
19 .|. sun ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 0
20 .|. dim ..|.. 0 ..|.. 0 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|.... .............. |. 0
21 .|. dic ...|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|.... .............. |. 1
xx .|. yyy ...|.. up to fifty thousand rows.. ...... |....................|.
x
user (example:id 15) would like to get 100 result ordered by best match with him (q01 to q30 columns to match). Highest percentage of match should return first.
Please help me make the query
SELECT * FROM table WHERE condition ORDER BY matching condition LIMIT 0,100
What are the conditions, that I need?
回答1:
If, instead of 30 columns, you had INT UNSIGNED, with 30 0/1 values meaning no/yes, ...
BIT_COUNT(XOR(col, to_match_against))
says how many of the bits disagree
From that, you can subtract from 30 and divide by 30 and multiply by 100 to get percentage agreement. Then ORDER BY.
回答2:
- Perfect matches:
In this case, you should create your per answer column where you create this bitmap manually (1 bit for each question). Create an index on this column.
Table should look like:
user_id q1 q2 ... qn accumulator (>n bits)
1 red no yes 100110101
- Approximate matches:
If using a bitmap index, you have to search all x bit variations of the key. Where x / Number_of_questions * 100 is the minimum percentage.
EX: 1 bit varying keys: From 101 you would have 001, 111, 100.
If different questions have different weights, you cant factor this in at the application level.
I would suggest you normalize your answer table in case question number is not static(in case you might want to add or remove questions later). This depends on the storage engine (shouldn't be a problem for MongoDB).
Again, using the accumulator, table should look like:
user_id answer_id accumulator (>n bits)
1 1 100110101
Now when you search, you will XOR your result and sort by this.
SELECT * FROM answers ORDER BY BIT_COUNT(myAnswer ^ accumulator) ASC;
来源:https://stackoverflow.com/questions/34379837/select-highest-matching-results-from-n-columns