building fulltext search index for jena and lucene

六月ゝ 毕业季﹏ 提交于 2019-12-13 03:24:47

问题


I would like to perform a full text search on a subset of dbpedia (which i have in a tdb store) with lucene and jena.

String TDBDirectory = "path" ;
Dataset dataset = TDBFactory.createDataset(TDBDirectory) ;

But not over all resources, only over titles. I think by making indices only over the needed triples I can perform a faster search. E.g.

<http://de.dbpedia.org/resource/Gurke> <http://www.w3.org/2000/01/rdf-schema#label> "Gurke"@de .

Here I would like to search for "Gurke", but not in any other triples than the ones with the #label property. So my question is how do I build indices and search only triples with the #label property? I have already looked at http://jena.sourceforge.net/ARQ/lucene-arq.html but it's not detailed enough or too difficult for me.


回答1:


http://jena.sourceforge.net/ is the old home for Jena -- the project is now http://jena.apache.org/ (how did you managed to find that old page?)

The project recently introduced a replacement for LARQ.

http://jena.apache.org/documentation/query/text-query.html

and this is now part of the main codebase. It will released with the 2.10.2 release - for the moment you must use the development build from https://repository.apache.org/content/repositories/snapshots/org/apache/jena/. You either need to be using Fuseki or add it as a dependency for your project.

This new text search subsystem works much better with TDB and Fuseki.



来源:https://stackoverflow.com/questions/17111903/building-fulltext-search-index-for-jena-and-lucene

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!