问题
I want to find instances of scientists who both were born and died in prime-numbered years. Building on a previous discussion at this URL, I devised the following query, which is unwieldy and times out:
SELECT ?birthYear ?deathYear ?scientist ?scientistLabel
WHERE
{
{
select ?value1
{
?number wdt:P31 wd:Q49008.
?number wdt:P1181 ?value1
filter(?value1 < year(now()))
}
}
{
select ?value2
{
?number wdt:P31 wd:Q49008.
?number wdt:P1181 ?value2
filter(?value2 < ?value1)
}
}
?scientist wdt:P106 wd:Q901.
?scientist wdt:P570 ?deathDate.
?scientist wdt:P569 ?birthDate
BIND(year(?deathDate) as ?deathYear)
BIND(year(?birthDate) as ?birthYear)
filter(?deathYear = ?value1)
filter(?birthYear = ?value2)
SERVICE wikibase:label { bd:serviceParam wikibase:language " [AUTO_LANGUAGE],en". }
}
order by asc(?deathYear) asc(?scientistLabel)
limit 100
I'm a SPARQL novice, but as far as I can tell, this will take pairs of prime-numbers and then see if anyone whose occupation is 'scientist' died on the year corresponding to the first, then if that person was born on the year corresponding to the 2nd. Is there a way to improve the performance of this query?
回答1:
There are 2 problems with your query:
- One can't pass values into subquery in such a way (see Bottom Up Semantics).
- Values of
wdt:P1181
arexsd:decimal
s, whereasyear()
returnsxsd:integer
s. One is forced to useFILTER(?birthYear = ?value)
which is less performative and less optimizable than simple joins. It seems that Blazegraph has to materialize solutions prematurely.
Hence, your query should be:
SELECT DISTINCT ?scientist ?scientistLabel ?birthYear ?deathYear {
{
SELECT (xsd:integer(?value1) as ?birthYear) {
?number wdt:P31 wd:Q49008.
?number wdt:P1181 ?value1
FILTER(?value1 < year(now()))
}
}
{
SELECT (xsd:integer(?value2) AS ?deathYear) {
?number wdt:P31 wd:Q49008.
?number wdt:P1181 ?value2
FILTER(?value2 < year(now()))
}
}
?scientist wdt:P106 wd:Q901.
?scientist wdt:P570 ?deathDate.
?scientist wdt:P569 ?birthDate.
FILTER(isLiteral(?birthDate) && isLiteral(?deathDate))
BIND(year(?deathDate) AS ?deathYear)
BIND(year(?birthDate) AS ?birthYear)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ASC(?birthDate) ASC(?deathDate)
Try it
I've also added FILTER(isLiteral(?birthDate))
, because values of wdt:P1181
might be unknown values (i. e. RDF blank nodes), in which case BIND (year(?birthDate) AS ?birthYear)
projects the variable which is always unbound and matches everything in joins.
You could speed up the query even more using Blazegraph specific tricks:
- reusable named subqueries
- query hints
SELECT DISTINCT ?scientist ?scientistLabel ?birthYear ?deathYear
WITH {
SELECT (xsd:integer(?value) AS ?year) {
[] wdt:P31 wd:Q49008 ; wdt:P1181 ?value.
hint:Prior hint:rangeSafe true.
FILTER(?value <= year(now()))
}
} AS %primes {
# hint:Query hint:maxParallel 50 .
# hint:Query hint:chunkSize 250 .
{ SELECT (?year AS ?birthYear) { include %primes } }
{ SELECT (?year AS ?deathYear) { include %primes } }
?scientist wdt:P106 wd:Q901.
?scientist wdt:P570 ?deathDate.
?scientist wdt:P569 ?birthDate.
FILTER (isLiteral(?deathDate) && isLiteral(?birthDate))
BIND (year(?birthDate) AS ?birthYear)
BIND (year(?deathDate) AS ?deathYear)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} ORDER BY ASC(?birthYear) ASC (?deathYear)
Try it
来源:https://stackoverflow.com/questions/53102725/make-filtering-people-by-birthyear-and-deathyear-criteria-more-performative-in-s