Indexing crashes on custom tokenizer

≡放荡痞女 提交于 2019-12-13 20:43:29

问题


We are building a Solr plug-in to link our proprietary engine. The intended use is replacing the standard tokenizer altogether. (This is the background: Hybrid search and indexing: words and token metadata in Solr)

When trying to index a test document in the Solr Admin:

id,title
12345,A test title

I am getting an exception where, I suppose, my tokenizer is kicking in.

The configuration changes (schema.xml) are:

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
<!--
     <analyzer type="query">
        <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer> 
     <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
-->
    </fieldType>
    <fieldType name="family_id_space_delimited_list" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
            <!--
            <filter class="com.linguasys.carabao.FamilyIDFilterFactory" />
            -->
          </analyzer>
    </fieldType>

    <fieldType name="role_space_delimited_list" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <tokenizer class="com.linguasys.carabao.ViaWebTokenizerFactory" url="http://blahblah/carabao/?wsdl"/>
            <!--
            <filter class="com.linguasys.carabao.RoleFilterFactory" />
            -->
          </analyzer>
    </fieldType>

The web service itself works. (The filters are commented out because they were crashing with some kind of type mismatch error, but that's for later.)

The exception is below. It's not just "what am doing wrong", it's "where do I get more info?"

org.apache.solr.common.SolrException: Exception writing document id 12345 to the index; possible analysis error.
  at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)
  at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
  at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:870)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1024)
  at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:693)
  at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
  at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
  at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
  at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
  at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
  at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:655)
  at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:222)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1523)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: input AttributeSource must not be null
  at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:94)
  at org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:106)
  at org.apache.lucene.analysis.TokenFilter.<init>(TokenFilter.java:33)
  at org.apache.lucene.analysis.util.FilteringTokenFilter.<init>(FilteringTokenFilter.java:70)
  at org.apache.lucene.analysis.core.StopFilter.<init>(StopFilter.java:60)
  at org.apache.lucene.analysis.core.StopFilterFactory.create(StopFilterFactory.java:127)
  at org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67)
  at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:102)
  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:180)
  at org.apache.lucene.document.Field.tokenStream(Field.java:554)
  at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:597)
  at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
  at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
  at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:222)
  at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
  at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
  at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
  at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
  ... 35 more
",

回答1:


You need to verify what happens when yourTokenizer.create(java.io.Reader reader) is invoked. From the stack trace it looks like this method is returning null, and this value is propagated all the way up to AttributeSource.<init>(AttributeSource.java:94). At this point returning null is illegal hence the exception.

The best way for you to find out what's going on is to enable debugger and stop at the above mentioned line.



来源:https://stackoverflow.com/questions/24947122/indexing-crashes-on-custom-tokenizer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!