How can i get the Fuseki API via SPARQLWrapper to properly report a detailed error message?

南楼画角 提交于 2021-01-29 05:40:49

问题


As outlined:in

https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py

and

https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py

I tried to allow for a "round trip" operation between python list of dicts and Jena/SPARQL based storage.

The approach performs very well for my usecase and after trying it out for a while i get into more details that need to be addressed.

The stackoverflow question listOfDict to RDF conversion in python targeting Apache Jena Fuseki addresses the initial issues and https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed issues 2-5 show some detail problems that were already fixed.

Now I am working with some 180000 records i'd like to import from 6 different data sources and each data source seems to have new exotic records that make the approach fail.

E.g. one batch of records gives me the following log:

read 45601 events in   0.6 s
storing 45601 events to sparql
  batch for         1 -      2000 of     45601 cr:Event in    0.6 s ->    0.6 s
  batch for      2001 -      4000 of     45601 cr:Event in    0.5 s ->    1.1 s
  batch for      4001 -      6000 of     45601 cr:Event in    0.5 s ->    1.6 s
  batch for      6001 -      8000 of     45601 cr:Event in    0.5 s ->    2.1 s
  batch for      8001 -     10000 of     45601 cr:Event in    0.5 s ->    2.6 s
  batch for     10001 -     12000 of     45601 cr:Event in    0.7 s ->    3.2 s
======================================================================
ERROR: testCrossref (tests.test_Crossref.TestCrossref)
test loading crossref data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py", line 1073, in _query
    response = urlopener(request)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.

Response:
b'Error 400: Bad Request\n'

Now since I don't get any details on what the problem is i am working with a binary search. With the error above i only know the problem is with a record with a batchIndex between 12000 and 14000 so I am . setting the limit to 14000 and batchSize to 100 to get closer.

 batch for     13301 -     13400 of     14000 cr:Event in    0.0 s ->    4.3 s

is now the last successful batch. So i am using a binary search: 13450 fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, 13421 ok So record 13422 is the culprit and I switch on debug mode to see the INSERT Data created for the record:

  cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
  cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
  cr:Event__102140gtm20003 cr:Event_source "crossref".
  cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
  cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local fields".
  cr:Event__102140gtm20003 cr:Event_startDate "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
  cr:Event__102140gtm20003 cr:Event_year 1999.
  cr:Event__102140gtm20003 cr:Event_month 9.
  cr:Event__102140gtm20003 cr:Event_endDate "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.

So the Umlaut-encoding "\u" in the location "Münster" is the culprit here. I will work around this issue. The real question is:

How can i get the Fuseki API via SPARQLWrapper to properly report a detailed error message*

e.g. with something like

error in line # cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is  not a valid triple?

来源:https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!