How to find similar content using SPARQL

不问归期 提交于 2019-11-28 00:19:08

Matching some specific properties

It sounds like you're asking for something along the lines of

select ?similarMovie ?genre ?director ?location ?actor where { 
  values ?movie { <http://.../TheMatrix> }
  ?genre   ^:hasGenre ?movie, ?similarMovie .
  ?director ^:hasDirectory ?movie, ?similarMovie .
  ?location ^:hasLocation ?movie, ?similarMovie .
  optional { ?actor ^:hasActor ?movie, ?similarMovie .
}

That uses the backwards path notation ^ and object lists to make it much shorter than:

select ?similarMovie ?genre ?director ?location ?actor where { 
  values ?movie { <http://.../TheMatrix> }
  ?movie        :hasGenre    ?genre .
  ?movie        :hasDirector ?director .
  ?movie        :hasLocation ?location .
  ?similarMovie :hasGenre    ?genre .
  ?similarMovie :hasDirector ?director .
  ?similarMovie :hasLocation ?location .
  optional { 
    ?movie        :hasActor ?actor .
    ?similarMovie :hasActor ?actor .
  }
}

For instance, using DBpedia, we can get other films that have the same distributor and cinematographer as The Matrix:

select ?similar ?cinematographer ?distributor where {
  values ?movie { dbpedia:The_Matrix }
  ?cinematographer ^dbpprop:cinematography ?movie, ?similar .
  ?distributor ^dbpprop:distributor ?movie, ?similar .
}
limit 10

SPARQL Results

The results are all within that same franchise; you get: The Matrix, The Matrix Reloaded, The Matrix Revolutions, The Matrix (franchise), and The Ultimate Matrix Collection.

Matching at least some number of properties

It's also possible to ask for things that have at least some number of properties in common. How many properties two things need to have in common before they should be considered similar is obviously subjective, will depend on the particular data, and will need some experimentation. For instance, we can ask for Films on DBpedia that have at least 35 properties in common with the Matrix with a query like this:

select ?similar where { 
  values ?movie { dbpedia:The_Matrix }
  ?similar ?p ?o ; a dbpedia-owl:Film .
  ?movie   ?p ?o .
}
group by ?similar ?movie
having count(?p) > 35

SPARQL results

This gives 13 movies (including the Matrix and the other movies in the franchise):

  • V for Vendetta (film)
  • The Matrix
  • The Postman (film)
  • Executive Decision
  • The Invasion (film)
  • Demolition Man (film)
  • The Matrix (franchise)
  • The Matrix Reloaded
  • Freejack
  • Exit Wounds
  • The Matrix Revolutions
  • Outbreak (film)
  • Speed Racer (film)

Using this kind of approach, you could even use the number of common properties as a measure of similarity. For instance:

select ?similar (count(?p) as ?similarity) where { 
  values ?movie { dbpedia:The_Matrix }
  ?similar ?p ?o ; a dbpedia-owl:Film .
  ?movie   ?p ?o .
}
group by ?similar ?movie
having count(?p) > 35
order by desc(?similarity)

SPARQL results

The Matrix             206
The Matrix Revolutions  63
The Matrix Reloaded     60
The Matrix (franchise)  55
Demolition Man (film)   41
Speed Racer (film)      40
V for Vendetta (film)   38
The Invasion (film)     38
The Postman (film)      36
Executive Decision      36
Freejack                36
Exit Wounds             36
Outbreak (film)         36
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!