问题
I am trying to extract labels from DBpedia for some persons. I am partially successful now, but I got stuck in the following problem. The following code works.
public class DbPediaQueryExtractor {
public static void main(String [] args) {
String entity = "Aharon_Barak";
String queryString ="PREFIX dbres: <http://dbpedia.org/resource/> SELECT * WHERE {dbres:"+ entity+ "<http://www.w3.org/2000/01/rdf-schema#label> ?o FILTER (langMatches(lang(?o),\"en\"))}";
//String queryString="select * where { ?instance <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>; <http://www.w3.org/2000/01/rdf-schema#label> ?o FILTER (langMatches(lang(?o),\"en\")) } LIMIT 5000000";
QueryExecution qexec = getResult(queryString);
try {
ResultSet results = qexec.execSelect();
for ( ; results.hasNext(); )
{
QuerySolution soln = results.nextSolution();
System.out.print(soln.get("?o") + "\n");
}
}
finally {
qexec.close();
}
}
public static QueryExecution getResult(String queryString){
Query query = QueryFactory.create(queryString);
//VirtuosoQueryExecution vqe = VirtuosoQueryExecutionFactory.create (sparql, graph);
QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query);
return qexec;
}
}
However, when the entity contains brackets, it does not work. For example,
String entity = "William_H._Miller_(writer)";
leads to this exception:
Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Encountered " "(" "( "" at line 1, column 86.`
What is the problem?
回答1:
It took some copying and pasting to see what exactly was going on. I'd suggest that you put newlines in your query for easier readability. The query you're using is:
PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
dbres:??? <http://www.w3.org/2000/01/rdf-schema#label> ?o
FILTER (langMatches(lang(?o),"en"))
}
where ???
is being replaced by the contents of the string entity
. You're doing absolutely no input validation here to ensure that the value of entity
will be legal to paste in. Based on your question, it sounds like entity
contains William_H._Miller_(writer)
, so you're getting the query:
PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
dbres:William_H._Miller_(writer) <http://www.w3.org/2000/01/rdf-schema#label> ?o
FILTER (langMatches(lang(?o),"en"))
}
You can paste that into the public DBpedia endpoint, and you'll get a similar parse error message:
Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at 'writer' before ')'
SPARQL query:
define sql:big-data-const 0
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> PREFIX dbres: <http://dbpedia.org/resource/>
SELECT * WHERE
{
dbres:William_H._Miller_(writer) <http://www.w3.org/2000/01/rdf-schema#label> ?o
FILTER (langMatches(lang(?o),"en"))
}
Better than hitting DBpedia's endpoint with bad queries, you can also use the SPARQL query validator, which reports for that query:
Syntax error: Lexical error at line 4, column 34. Encountered: ")" (41), after : "writer"
In Jena, you can use the ParameterizedSparqlString to avoid these sorts of issues. Here's your example, reworked to use a parameterized string:
import com.hp.hpl.jena.query.ParameterizedSparqlString;
public class PSSExample {
public static void main( String[] args ) {
// Create a parameterized SPARQL string for the particular query, and add the
// dbres prefix to it, for later use.
final ParameterizedSparqlString queryString = new ParameterizedSparqlString(
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n" +
"SELECT * WHERE\n" +
"{\n" +
" ?entity rdfs:label ?o\n" +
" FILTER (langMatches(lang(?o),\"en\"))\n" +
"}\n"
) {{
setNsPrefix( "dbres", "http://dbpedia.org/resource/" );
}};
// Entity is the same.
final String entity = "William_H._Miller_(writer)";
// Now retrieve the URI for dbres, concatentate it with entity, and use
// it as the value of ?entity in the query.
queryString.setIri( "?entity", queryString.getNsPrefixURI( "dbres" )+entity );
// Show the query.
System.out.println( queryString.toString() );
}
}
The output is:
PREFIX dbres: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{
<http://dbpedia.org/resource/William_H._Miller_(writer)> rdfs:label ?o
FILTER (langMatches(lang(?o),"en"))
}
You can run this query at the public endpoint and get the expected results. Notice that if you use an entity
that doesn't need special escaping, e.g.,
final String entity = "George_Washington";
then the query output will use the prefixed form:
PREFIX dbres: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE
{
dbres:George_Washington rdfs:label ?o
FILTER (langMatches(lang(?o),"en"))
}
This is very convenient, because you don't have to do any checking about whether your suffix, i.e., entity
, has any characters that need to be escaped; Jena takes care of that for you.
来源:https://stackoverflow.com/questions/18232010/sparql-queries-with-round-brackets-throw-exception