Creating and updating Zend_Search_Lucene indexes

我与影子孤独终老i 提交于 2020-01-02 02:24:06

问题


I'm using Zend_Search_Lucene to create an index of articles to allow them to be searched on my website. Whenever a administrator updates/creates/deletes an article in the admin area, the index is rebuilt:

$config = Zend_Registry::get("config");
$cache = $config->lucene->cache;
$path = $cache . "/articles";

try
{
    $index = Zend_Search_Lucene::open($path);
}
catch (Zend_Search_Lucene_Exception $e)
{
    $index = Zend_Search_Lucene::create($path);
}

$model = new Default_Model_Articles();
$select = $model->select();
$articles = $model->fetchAll($select);

foreach ($articles as $article)
{
    $doc = new Zend_Search_Lucene_Document();
    $doc->addField(Zend_Search_Lucene_Field::Text("title", $article->title));
    $index->addDocument($doc);
}

$index->commit();

My question is this. Since I am reindexing the articles and handling deleted articles as well, why would I not just use "create" every time (instead of "open" and update)? Using the above method, I think the articles would be added with addDocument every time (so there would be duplicates). How would I prevent that? Is there a way to check if a Document exists already in the index?

Also, I don't think I fully understand how the indexing works when you "open" and update it. It seems to create new #.cfs (so I have _0.cfs, _1.cfs, _2.cfs) files in the index folder every time, but when I use "create", it overwrites that file with a new #.cfs file with the # incremented (so, for example just _2.cfs). Can you please explain what these segmented files are?


回答1:


Yes , you can check if a Document is already in the index, have a look in this Manual Page. You can then delete this specific Document from the index via $index->delete($id);, where $id is the return value of the termDocs method. After that you can simply add the new version of the Document.

About the multiple index files that Lucene creates: Every time you modify an existing index, Lucene does not realy change the existing files, but adds partial indexes for every change you make. This is extremely bad for performance, but there is a simple way around this. After every change you make to the index do this: $index->optimize(); - this will append all the partial files to the real index, improving searchtimes dramatically.



来源:https://stackoverflow.com/questions/1484876/creating-and-updating-zend-search-lucene-indexes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!