Excluding items selectively from Sitecore's Lucene search index - works when rebuilding with IndexViewer, but not when using Sitecore's built-in tools

一曲冷凌霜 提交于 2019-12-05 05:17:45

I spoke with Alex Shyba yesterday, and we were able to figure out what was going on. There were a couple of problems with my configuration that was preventing everything from working correctly:

  • As Seth noted, there are two distinct search APIs in Sitecore. My configuration file was using both of them. To use the newer API, only the sitecore/search/configuration section needs to be set up (In addition to what I posted in my OP, I was also adding indexes in sitecore/indexes and sitecore/databases/database/indexes, which is not correct).

  • Instead of overriding IsMatch(), I should have been overriding AddItem(). Because of the way Lucene works, you can't update a document in place; instead, you have to first delete it and then add the updated version.

    When Sitecore.Search.Crawlers.DatabaseCrawler.UpdateItem() runs, it checks IsMatch() to see if it should delete and re-add the item. If IsMatch() returns false, the item won't be removed from the index even if it shouldn't be there in the first place.

    By overriding AddItem(), I was able to instruct the crawler whether the item should be added to the index after its existing documents had already been removed. Here is what the updated class looks like:

    ~\Lib\Search\Indexing\CustomCrawler.cs:

    using Sitecore.Data.Items;
    using Sitecore.Search;
    using Sitecore.Search.Crawlers;
    
    namespace MyProject.Lib.Search.Indexing
    {
      public class CustomCrawler : DatabaseCrawler
      {
        protected override void AddItem(Item item, IndexUpdateContext context)
        {
          if (item["include in search results"] == "1")
          {
            base.AddItem(item, context);
          }
        }
      }
    }
    

Alex also pointed out that some of my scalability settings were incorrect. Specifically:

  • The InstanceName setting was empty, which can cause problems on ephemeral (cloud) instances where the machine name might change between executions. We changed this setting on each instance to have a constant and distinct value (e.g., CMS and CD).

  • The Indexing.ServerSpecificProperties setting needs to be true so that each instance maintains its own record of when it last updated its search index.

  • The EnableEventQueues setting needs to be true to prevent race conditions between the search indexing and cache flush processes.

  • When in development, the Indexing.UpdateInterval should be set to a relatively small value (e.g., 00:00:15). This is not great for production environments, but it cuts down on the amount of waiting you have to do when troubleshooting search indexing problems.

  • Make sure the history engine is turned on for each web database, including remote publishing targets:

    <database id="production">
      <Engines.HistoryEngine.Storage>
        <obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
          <param connectionStringName="$(id)" />
          <EntryLifeTime>30.00:00:00</EntryLifeTime>
        </obj>
      </Engines.HistoryEngine.Storage>
      <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>
    </database>
    

To manually rebuild the search indexes on CD instances, since there is no access to the Sitecore backend, I also installed RebuildDatabaseCrawlers.aspx (from this article).

I think I've figured out a halfway solution.

Here's an interesting snippet from Sitecore.Shell.Applications.Search.RebuildSearchIndex.RebuildSearchIndexForm.Builder.Build(), which is invoked by the search index rebuilder in the Control Panel application:

for (int i = 0; i < database.Indexes.Count; i++)
{
  database.Indexes[i].Rebuild(database);
  ...
}

database.Indexes contains a set of Sitecore.Data.Indexing.Index, which do not use a database crawler to rebuild the index!

In other words, the built-in search indexer uses a completely different class when rebuilding the search index that ignores the search configuration settings in web.config entirely.

To work around this, I changed the following files: ~\App_Config\Include\Search Indexes\Website.config:

<indexes>
  <index id="website" ... type="MyProject.Lib.Search.Indexing.CustomIndex, MyProject">
    ...
  </index>

  ...
</indexes>

~\Lib\Search\Indexing\CustomIndex.cs:

using Sitecore.Data;
using Sitecore.Data.Indexing;
using Sitecore.Diagnostics;

namespace MyProject.Lib.Search.Indexing
{
  public class CustomIndex : Index
  {
    public CustomIndex(string name)
      : base(name)
    {
    }

    public override void Rebuild(Database database)
    {
      Sitecore.Search.Index index = Sitecore.Search.SearchManager.GetIndex(Name);
      if (index != null)
      {
        index.Rebuild();
      }
    }
  }
}

The only caveat to this method is that it will rebuild the index for every database, not just the selected one (which I'm guessing is why Sitecore has two completely separate methods for rebuilding indexes).

Sitecore 6.2 uses the both old and newer search api, hence the differneces in how the index gets built I believe. CMS 6.5 (soon to be released) just uses the newer api - e.g., Sitecore.Search

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!