Select highest matching results from n columns

时光总嘲笑我的痴心妄想 提交于 2019-12-13 08:05:21

问题


Order by higher percentage matching checking 30 columns mysql

I would like to make a profile matching project. The target is to return say 100 results matching better percentage first. the scenery is -

A user has yes or no answer of 30 questions(all answered).
User is interested to see 100 people who has matching with him order by higher percentage

I need suggestion to decide how I will make the table and query ensuring minimum processing load -

Should I store answers in separate columns (value is yes/no in each column) or in same column separated by comma (only yes answers educated,tall,rich,single,caring)?

What should be the query for Table A and Table B to return highest matching order by percentage.

Here is the Table (answers for 30 fixed questions, yes/no type answer)

.id | name | q01 | q02 | q03 | q04 | q05 | q06 |...continue...| q30

11 .|. tom ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1

12 .|. mik ..|.. 0 ..|.. 0 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|..... ............. |. 0

13 .|. jim ...|.. 1 ..|.. 1 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1

14 .|. don ..|.. 0 ..|.. 1 ..|.. 1 ...|.. 0 ..|.. 0 ..|.. 0 ..|..... ............. |. 1

15 .|. ric ....|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 0

16 .|. jam ..|.. 0 ..|.. 1 ..|.. 0 ...|.. 0 ..|.. 0 ..|.. 0 ..|..... ............. |. 1

17 .|. joe ...|.. 1 ..|.. 1 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|..... ............. |. 1

18 .|. ima ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1

19 .|. sun ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 0

20 .|. dim ..|.. 0 ..|.. 0 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|.... .............. |. 0

21 .|. dic ...|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|.... .............. |. 1

xx .|. yyy ...|.. up to fifty thousand rows.. ...... |....................|. 

x user (example:id 15) would like to get 100 result ordered by best match with him (q01 to q30 columns to match). Highest percentage of match should return first.

Please help me make the query

SELECT * FROM table WHERE condition ORDER BY matching condition LIMIT 0,100

What are the conditions, that I need?


回答1:


If, instead of 30 columns, you had INT UNSIGNED, with 30 0/1 values meaning no/yes, ...

BIT_COUNT(XOR(col, to_match_against))

says how many of the bits disagree

From that, you can subtract from 30 and divide by 30 and multiply by 100 to get percentage agreement. Then ORDER BY.




回答2:


  • Perfect matches:

In this case, you should create your per answer column where you create this bitmap manually (1 bit for each question). Create an index on this column.

Table should look like:

user_id  q1   q2 ... qn  accumulator (>n bits)
1          red  no     yes 100110101 
  • Approximate matches:

If using a bitmap index, you have to search all x bit variations of the key. Where x / Number_of_questions * 100 is the minimum percentage.

EX: 1 bit varying keys: From 101 you would have 001, 111, 100.

If different questions have different weights, you cant factor this in at the application level.

I would suggest you normalize your answer table in case question number is not static(in case you might want to add or remove questions later). This depends on the storage engine (shouldn't be a problem for MongoDB).

Again, using the accumulator, table should look like:

user_id answer_id  accumulator (>n bits)
1       1          100110101 

Now when you search, you will XOR your result and sort by this.

SELECT * FROM answers ORDER BY BIT_COUNT(myAnswer ^ accumulator) ASC;


来源:https://stackoverflow.com/questions/34379837/select-highest-matching-results-from-n-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!