Getting most similar rows in MySQL table and order them by similarity

问题

I have a database table that holds user's vehicles (cars, motorcycles). I want to get the most similar vehicles out of that table. Lets say the table holds the following columns (with some context to get the idea):

table: vehicles


vehicle_id (pk, auto-increment)
model_id (BMW 3er, Honda Accord)
fuel_type (gasoline, diesel)
body_style (sedan, coupe)
year
engine_size (2.0L)
engine_power (150hp)

So in short I want to select N (usually 3) rows that have the same make_id (at least) and rank them by the amount of similarities they share with the seed vehicle lets say if the fuel_type matches, I'd have rank points +3, but if the body_style matches, it would be +1. Ideally I would get N vehicles that have maximum points but the idea is to still get something when I don't.

回答1:

As in my table currently I have only around 5k rows and they are slowly growing, I decided to actually use the following simple approach (it came to me just after I wrote the question).

The seed lets say is Honda Accord (model_id 456), 2004, gasoline, 2.0L, 155hp, sedan with auto-inc ID 123.

SELECT vehicles.*,  
    (IF(`fuel_type`='gasoline', 3, 0) + 
     IF(`body_style`='sedan', 1, 0) + 
     IF(`year` > 2001 AND `year` < 2007, 2, 0) + 
     IF(`engine_size` >= 1.8 AND `engine_size` <= 2.2, 1, 0) + 
     IF(`engine_power`=155, 3, IF(`engine_power`>124 AND `engine_power`<186, 1, 0))) AS `rank`
FROM vehicles
WHERE vehicle_id!=123 AND model_id=456
ORDER BY `rank` DESC
LIMIT 3

It will work, as long as I don't too many rows. If the table becomes 50-100k, I probably will have to switch to something like Lucene?

来源：https://stackoverflow.com/questions/17632113/getting-most-similar-rows-in-mysql-table-and-order-them-by-similarity

标签

mysql

similarity