Make filtering people by birthYear and deathYear criteria more performative in SPARQL query

£可爱£侵袭症+ 提交于 2020-01-05 06:17:56

问题


I want to find instances of scientists who both were born and died in prime-numbered years. Building on a previous discussion at this URL, I devised the following query, which is unwieldy and times out:

SELECT ?birthYear ?deathYear ?scientist ?scientistLabel 
WHERE 
{ 
 {
  select ?value1
    { 
      ?number wdt:P31 wd:Q49008. 
      ?number wdt:P1181 ?value1 
      filter(?value1 < year(now())) 
    }
  }
  {
  select ?value2
    { 
      ?number wdt:P31 wd:Q49008. 
      ?number wdt:P1181 ?value2 
      filter(?value2 < ?value1)
    }
  }
  ?scientist wdt:P106 wd:Q901. 
  ?scientist wdt:P570 ?deathDate.
  ?scientist wdt:P569 ?birthDate
  BIND(year(?deathDate) as ?deathYear) 
  BIND(year(?birthDate) as ?birthYear)
  filter(?deathYear = ?value1)
  filter(?birthYear = ?value2)
  SERVICE wikibase:label { bd:serviceParam wikibase:language " [AUTO_LANGUAGE],en". } 
} 
order by asc(?deathYear) asc(?scientistLabel) 
limit 100

I'm a SPARQL novice, but as far as I can tell, this will take pairs of prime-numbers and then see if anyone whose occupation is 'scientist' died on the year corresponding to the first, then if that person was born on the year corresponding to the 2nd. Is there a way to improve the performance of this query?


回答1:


There are 2 problems with your query:

  1. One can't pass values into subquery in such a way (see Bottom Up Semantics).
  2. Values of wdt:P1181 are xsd:decimals, whereas year() returns xsd:integers. One is forced to use FILTER(?birthYear = ?value) which is less performative and less optimizable than simple joins. It seems that Blazegraph has to materialize solutions prematurely.

Hence, your query should be:

SELECT DISTINCT ?scientist ?scientistLabel ?birthYear ?deathYear {
  {
    SELECT (xsd:integer(?value1) as ?birthYear) { 
      ?number wdt:P31 wd:Q49008. 
      ?number wdt:P1181 ?value1 
      FILTER(?value1 < year(now())) 
    }
  }
  {
    SELECT (xsd:integer(?value2) AS ?deathYear) { 
      ?number wdt:P31 wd:Q49008. 
      ?number wdt:P1181 ?value2 
      FILTER(?value2 < year(now()))
    }
  }
  ?scientist wdt:P106 wd:Q901.
  ?scientist wdt:P570 ?deathDate.
  ?scientist wdt:P569 ?birthDate.
  FILTER(isLiteral(?birthDate) && isLiteral(?deathDate))
  BIND(year(?deathDate) AS ?deathYear) 
  BIND(year(?birthDate) AS ?birthYear)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
} ORDER BY ASC(?birthDate) ASC(?deathDate)

Try it

I've also added FILTER(isLiteral(?birthDate)), because values of wdt:P1181 might be unknown values (i. e. RDF blank nodes), in which case BIND (year(?birthDate) AS ?birthYear) projects the variable which is always unbound and matches everything in joins.


You could speed up the query even more using Blazegraph specific tricks:

  • reusable named subqueries
  • query hints
SELECT DISTINCT ?scientist ?scientistLabel ?birthYear ?deathYear
   WITH {
    SELECT (xsd:integer(?value) AS ?year) {
      [] wdt:P31 wd:Q49008 ; wdt:P1181 ?value. 
      hint:Prior hint:rangeSafe true.
      FILTER(?value  <= year(now()))
    }
  } AS %primes {
    # hint:Query hint:maxParallel 50 . 
    # hint:Query hint:chunkSize 250 . 
  { SELECT (?year AS ?birthYear)  { include %primes } }
  { SELECT (?year AS ?deathYear)  { include %primes } } 
  ?scientist wdt:P106 wd:Q901.
  ?scientist wdt:P570 ?deathDate.
  ?scientist wdt:P569 ?birthDate.
  FILTER (isLiteral(?deathDate) && isLiteral(?birthDate))
  BIND (year(?birthDate) AS ?birthYear)
  BIND (year(?deathDate) AS ?deathYear)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} ORDER BY ASC(?birthYear) ASC (?deathYear)

Try it



来源:https://stackoverflow.com/questions/53102725/make-filtering-people-by-birthyear-and-deathyear-criteria-more-performative-in-s

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!