Retrieving data from geonames using SPARQL

我怕爱的太早我们不能终老 提交于 2019-12-05 08:08:50

问题


I am trying to get linked data from geonames in the following SPARQL, but obviously I'm doing someting wrong.

prefix oxprop: <http://ophileon.com/ox/property#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl:  <http://www.w3.org/2002/07/owl#>
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>

select ?poi ?poiname ?geonames ?latitude


from  <http://www.ophileon.com/ox/poi.rdf>
# from  <http://sws.geonames.org/ >

where
{

   ?poi rdfs:label ?poiname.
   ?poi owl:sameAs ?geonames.
#   ?geonames wgs84_pos:lat ?latitude.


  FILTER(langMatches(lang(?poiname), "EN")).

}

which, using sparql.org 's JSON output :

{
  "head": {
    "vars": [ "poi" , "poiname" , "geonames" , "latitude" ]
  } ,
  "results": {
    "bindings": [
      {
        "poi": { "type": "uri" , "value": "http://ophileon.com/ox/poi/2" } ,
        "poiname": { "type": "literal" , "xml:lang": "en" , "value": "Wageningen" } ,
        "geonames": { "type": "uri" , "value": "http://sws.geonames.org/2745088" }
      } ,
      {
        "poi": { "type": "uri" , "value": "http://ophileon.com/ox/poi/3" } ,
        "poiname": { "type": "literal" , "xml:lang": "en" , "value": "Netherlands" } ,
        "geonames": { "type": "uri" , "value": "http://sws.geonames.org/2750405" }
      } ,
      {
        "poi": { "type": "uri" , "value": "http://ophileon.com/ox/poi/1" } ,
        "poiname": { "type": "literal" , "xml:lang": "en" , "value": "Amsterdam" } ,
        "geonames": { "type": "uri" , "value": "http://sws.geonames.org/2759794" }
      }
    ]
  }
}

What I want to achieve is that it retrieves the latitude of each node using the geonames rdf service with addresses like "http://sws.geonames.org/2745088/about.rdf"

The lines starting with "#" are the ones I suspect to be incorrect..

Next iteration

After having added "/" behind the geonamesID , and running this:

prefix oxprop: <http://ophileon.com/ox/property#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl:  <http://www.w3.org/2002/07/owl#>
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>

select *

from <http://www.ophileon.com/ox/poi.rdf>
from <http://sws.geonames.org/2745088/about.rdf>    
from <http://sws.geonames.org/2750405/about.rdf>    
from <http://sws.geonames.org/2759794/about.rdf>
where
{
   ?poi rdfs:label ?poiname.
   ?poi owl:sameAs ?geonames.
   ?geonames wgs84_pos:lat ?latitude.
   FILTER(langMatches(lang(?poiname), "EN")).
}

Returns this:

-------------------------------------------------------------------------------------------------------
| poi                            | poiname          | geonames                           | latitude   |
=======================================================================================================
| <http://ophileon.com/ox/poi/2> | "Wageningen"@en  | <http://sws.geonames.org/2745088/> | "51.97"    |
| <http://ophileon.com/ox/poi/3> | "Netherlands"@en | <http://sws.geonames.org/2750405/> | "52.5"     |
| <http://ophileon.com/ox/poi/1> | "Amsterdam"@en   | <http://sws.geonames.org/2759794/> | "52.37403" |
-------------------------------------------------------------------------------------------------------

Next iteration : using "SERVICE" keyword

prefix oxprop: <http://ophileon.com/ox/property#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl:  <http://www.w3.org/2002/07/owl#>
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>


select ?poi ?poiname ?geonameuri ?latitude

from <http://www.ophileon.com/ox/poi.rdf>

where
{
   ?poi rdfs:label ?poiname.
   ?poi owl:sameAs ?geonameuri.
   SERVICE <http://factforge.net/sparql>{
   ?geonameuri wgs84_pos:lat ?latitude.
   }
   FILTER(langMatches(lang(?poiname), "EN")).
}

which results in what I wanted, except that factforge returns multiple values in various datatypes.
This resource http://wifo5-03.informatik.uni-mannheim.de/latc/www2012/Session%201.html proved to be very useful.


回答1:


Typos and Inability to Retrieve Data

I think there are two issues here. The first is a minor typo. When I run your query, with the commented lines uncommented, I get a parse error because of the line

from  <http://sws.geonames.org/ >

because there should not be a space in the IRI. That's easy to fix though. When fixed, the service at sparql.org replies that

Error 400: Failed to load URL (parse error) http://sws.geonames.org/ : Failed to determine the triples content type: (URI=http://sws.geonames.org/ : stream=null : hint=null)

Fuseki - version 1.0.0 (Build date: 2013-09-12T10:49:49+0100)

which, I believe, means that Jena was able to pull down the content of that IRI, but wasn't able to figure out how to read it as RDF. While a quick Google search shows plenty of queries where that IRI is used as a namespace prefix, I don't see any where it's used as a graph from which triples can be selected. I think this matches what geonames.org says in its documentation:

Entry Points into the GeoNames Semantic Web

There are several ways how you can enter the GeoNames Semantic Web :

  • start from mother earth and follow the Linked Data links.
  • use the geonames search webservice with the type=rdf parameter option.
  • download the database dump and construct the url for the features using the pattern "http://sws.geonames.org/geonameId/"
  • RDF dump with 8514201 features and about 125 mio rdf triples (2013 08 27). The dump has one rdf document per toponym on every line of the file. Note: The file is pretty large. Make sure the tool you use to uncompress is able to deal with the size and does not stop after 2GB, an issue that happens with some old (windows) tool versions.

I'm a bit surprised to not see a SPARQL endpoint in that list, but I expect that if there was one, it would be in this list of options.

Modifying the query to get some data

Now, the successful query (without the commented lines) returns these results:

poi                            poiname          geonames                          latitude
<http://ophileon.com/ox/poi/2> "Wageningen"@en  <http://sws.geonames.org/2745088>   
<http://ophileon.com/ox/poi/3> "Netherlands"@en <http://sws.geonames.org/2750405>   
<http://ophileon.com/ox/poi/1> "Amsterdam"@en   <http://sws.geonames.org/2759794>

Note: These were the results at the time that I started writing this answer. However, this is based on data in http://www.ophileon.com/ox/poi.rdf, which may have changed. On later runs of this query, I get values of geonames that have a final /, e.g., http://sws.geonames.org/2745088/.

Based on the same documentation, which also says that:

For the town Embrun in France we have these two URIs:

  1. http://sws.geonames.org/3020251/
  2. http://sws.geonames.org/3020251/about.rdf

The first URI [1] stands for the town in France. You use this URI if you want to refer to the town. The second URI [2] is the document with the information geonames has about Embrun.

This suggests that a query with those particular geonames IRIs also used as graphs names might work. That is, that a query like this might work:

prefix oxprop: <http://ophileon.com/ox/property#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl:  <http://www.w3.org/2002/07/owl#>
prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>

select ?poi ?poiname ?geonames ?latitude
from <http://www.ophileon.com/ox/poi.rdf>
from <http://sws.geonames.org/2745088/about.rdf>    
from <http://sws.geonames.org/2750405/about.rdf>    
from <http://sws.geonames.org/2759794/about.rdf>
where
{
   ?poi rdfs:label ?poiname.
   ?poi owl:sameAs ?geonames.
   ?geonames wgs84_pos:lat ?latitude.
   FILTER(langMatches(lang(?poiname), "EN")).
}

Now this still doesn't return any results, but it seems like all the data should be there. Let's try a simpler query. If you use a query like this:

select * 
from <http://sws.geonames.org/2759794/about.rdf>
where { ?s ?p ?o }

SPARQL results

you'll get a bunch of triples about that place. This does work with multiple from clauses, too. For instance, if you use that data and your data with the following query, you get the combined results.

select * 
from <http://www.ophileon.com/ox/poi.rdf>
from <http://sws.geonames.org/2745088/about.rdf>  
where { ?s ?p ?o }

SPARQL results

In looking at the results from that dataset, we can finally see where the problem is: the IRIs for the geonames resources end with / in their actual form, but don't have / in your data. You'll need to change your data accordingly.

Note: it seems that the data in http://www.ophileon.com/ox/poi.rdf has since been corrected.

It looks like you may end up needing to run your first query to determine data you want to get from geonames, retrieving that information, and then running a second query on that. Alternatively, you could download the big data dump provided by Geonames and use it locally (possibly the easiest solution).



来源:https://stackoverflow.com/questions/19393908/retrieving-data-from-geonames-using-sparql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!