sparql join query explanation hows its working?

帅比萌擦擦* 提交于 2019-12-10 12:35:11

问题


My query:

select ?x ?z
where
{
  ?x <http://purl.uniprot.org/core/name> ?y .
  ?x <http://purl.uniprot.org/core/volume> ?z .
  ?x <http://purl.uniprot.org/core/pages> "176-186" .
}

I required to make custom parser for this query.

When I do this query on jena model, it returns one record. Can anyone explain this query implementation?

I split out this query into three parts:

select ?x ?y where { ?x <http://purl.uniprot.org/core/name> ?y . }

Total Records Found : 3034

select ?x ?z where { ?x <http://purl.uniprot.org/core/name> ?y . ?x <http://purl.uniprot.org/core/volume> ?z . }

 Total Records Found : 2679

select ?x ?z where { ?x <http://purl.uniprot.org/core/name> ?y . ?x <http://purl.uniprot.org/core/volume> ?z . ?x <http://purl.uniprot.org/core/pages> "176-186" . }

 Total Records Found : 1

Please help me to make custom query parser.


回答1:


You are trying to calculate the join of the three triple patterns. Papers on join implementation over Apache Hadoop will be useful background.

It may helpful to look at Apache Spark and the Resilient Distributed Dataset (RDD) concept.

It is also important to consider likely selectivity of each pattern - as Joshua says, the "pages" pattern may well be yield a unique solution and using that to simply lookup each of "name" and "volume" is not a demanding task.

ARQ's in-memory algorithm is not aiming for maximum independent parallelism which is what you want on Hadoop. Merge joins (or sort-merge joins) make two parallelizable accesses to the data.

You can extend ARQ at the basic pattern level or at the whole algebra execution level, or any point in between, by extends class OpExecutor.




回答2:


It sounds like you're asking why

select ?x ?z where {
  ?x <http://purl.uniprot.org/core/name> ?y .           # (a)
  ?x <http://purl.uniprot.org/core/volume> ?z .         # (b)
  ?x <http://purl.uniprot.org/core/pages> "176-186" .   # (c)
}

returns just one result, while each line alone returns more. Triple patterns in SPARQL are conjunctive: non-optional patterns must be matched by the data in order for results to be returned. Thus, you're asking for the values of ?x and ?z where ALL of the following hold:

  • ?x has the name ?y, AND
  • ?x has some value for volume, AND
  • ?x has the specific value "176-186" for pages.

Based on the names of the properties, it sounds like you're querying some bibilographic information. It's not surprising that in a given bibliographic database, there might be only one article whose pages are exactly `"176-186", as that's a very specific value.




回答3:


Edited to include the correct algebra link

The best advice that I can offer is to look at the Jena documentation for ARQ's SPARQL Algebra and derive your custom evaluation engine at that level. Another reference that may be informative is the W3 SPARQL Algebra.

It seems (from the tags that you have selected) that you intend to perform query operations distributed throughout a map-reduce job, and you are looking at a specific example of the application of the algebra as a proof-of-concept. If your intent is to integrate this into Jena's query evaluation, then you will need to manually explore Jena's existing system in order to understand why it behaves the way it does.



来源:https://stackoverflow.com/questions/23500140/sparql-join-query-explanation-hows-its-working

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!