Creating a Lucene index for an existing Apache Jena TDB to implement text search

问题

I have a large Apache Jena TDB, I want to build a Lucene index using Apache Jena 2.10.2 for use with the new text search feature. I find the documentation hard to follow.

I first tried to use configuration in code, but had trouble with the dependencies. Any combination of lecene-core and solr-solrj would either result in certain 'classNotFound' errors or a 'StandardAnalyzer overrides final method tokenStream' error. Example of Code:

Dataset ds1 = DatasetFactory.createMem() ;

EntityDefinition entDef = new EntityDefinition("uri", "text", RDFS.label) ;

Directory dir =  new RAMDirectory();

// Have also tried creating the index in a file
File indexDir = new File("luceneIndexes");
Directory dir = FSDirectory.open(indexDir);

// Fails on this line
Dataset ds = TextDatasetFactory.createLucene(ds1, dir, entDef) ;

I think the only solution may be to create an Text Dataset Assembler, but if anyone has advice on creating this in code I would prefer to do it that way.

回答1:

The example is exactly the one from Jena, which does work.

It looks like you have a confusion of jar versions. Have you tried using maven to resolve the dependencies? Looking at "mvn dependency:tree" shows you what versions are used.

jena-text is built for Lucene 4.3.1 or Solr 4.3.1.

See the POM from: https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-text/1.0.0-SNAPSHOT/

来源：https://stackoverflow.com/questions/17954399/creating-a-lucene-index-for-an-existing-apache-jena-tdb-to-implement-text-search

标签

jena

lucene

text-search

tdb

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!