How to remove a Lookup from DefaultGazetteer programatically

别来无恙 提交于 2019-12-13 01:44:06

问题


I need to teach Gazetteer by adding/removing words.

I know how to add new Lookup but when I've tried to remove it, the Lookup was not removed.

gazetter.remove("string to be found"); // returns false

Any help, please!


回答1:


There are two separate things inside the (Default)Gazetteer:

  1. Finite state machine used for searching the source text.

  2. Linear definition of the gazetteer which represents all the lists of words in the dictionary. It is not used directly for searching the text.

At the startup, they are used as follows:

  1. The linear definition is read from input files.
  2. The finite state machine is constructed from the definition.

Methods like gazetter.add() or gazetter.remove() called directly on the gazetteer instance modify only the finite state machine. The changes will be visible on the gazetteer's behaviour but not in its linear definition.

Methods from the linear definition modify only the linear definition. You have to use store() and reInit() to update the finite state machine inside the gazetteer according to the linear definition. After that they will be in sync and the gazetteer will look for the new phrases. This also means that changes made only on the state machine will be lost after reinit.

To answer your question:

If your code returned false then the phrase was not present in the finite state machine and the gazetteer would not match such phrase in text anyway.

If you want to remove the phrase from the linear definition you have to use respective methods (shortly described in my previous answer).




回答2:


You probably could use the approach described in answer to your previous question Question about gazetteer update but with removing nodes. Guess that additional information you could find in javadoc.

Another option (brute force and with rare keywords update) is to: a) remove Gazetteer from your pipeline(and from scope using Factory.deleteResource) b) read .lst file as file with plain text per line c) remove entries d) save data back to previous file e) re-init gazetteer and add new PR to the same place in you pipeline.

I think that first option is more suitable for gazetteer update.



来源:https://stackoverflow.com/questions/30594237/how-to-remove-a-lookup-from-defaultgazetteer-programatically

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!