Sitecore 7 pdf indexing

时光总嘲笑我的痴心妄想 提交于 2019-12-10 13:08:43

问题


I try to index PDF files with Sitecore 7. I installed IFilter , but I received on crawlers log next error :

ManagedPoolThread #17 09:24:20 WARN  LuceneIndexOperations : Update : Could not build document data 4433434-3443-3223-91c4-233232. Skipping.
Exception: System.Runtime.InteropServices.COMException
Message: Error HRESULT E_FAIL has been returned from a call to a COM component.
Source: mscorlib
   at System.Runtime.InteropServices.ComTypes.IPersistFile.Load(String pszFileName, Int32 dwMode)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
   at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
   at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
   at Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder.AddComputedIndexFields()
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.GetIndexData(IIndexable indexable, IIndexable latestVersion, IProviderUpdateContext context)
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.BuildDataToIndex(IProviderUpdateContext context, IIndexable version, IIndexable latestVersion)
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.<>c__DisplayClass7.<Update>b__0(Item version)

What I have to do work because on Sitecore documentation they said it must work out of the box.


回答1:


I had the same issue and I received from Sitecore support next response (it works fine after):

1) Copy all the Adobe iFilter .dll files into the "\System32\Inetsrv" folder. This is the working directory for IIS on Windows Server. The Adobe iFilter .dll files are stored at the "C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin" folder by default. Also you can use the "IFilter Explorer" tool to detect the folder where the .dll files are stored: http://www.citeknet.com/Products/IFilters/IFilterExplorer/tabid/62/Default.aspx For more details please see the screenshot: http://screencast.com/t/xmWukanM+

2) Delete all the files under the "Website/App_Data/MediaCache" folder;

3) Rebuild the Sitecore Search Indexes (Sitecore -> Control Panel -> Indexing -> Indexing Manager);

4) Clear the Sitecore cache (the http://{hostname}/sitecore/admin/cache.aspx tool); 5) Restart the IIS;




回答2:


Here is the solution I took since I didn't like the idea of coping iFilter related DLLs into the system path.

  • install Adobe IFilter 9 (I used this link). Note version 9 is essential as starting at version X they abandoned file based interface.
  • add filter location to the PATH environment variable. In my case it was %ProgramFiles%\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin\.
  • run iisreset
  • go back to Sitecore app and run index rebuild for necessary indexes.

For your consideration:

  • while trying to resolve the issue I granted full access to IFilter folder for app pool account. I don't think it's necessary as I removed it at the end and everything was still working fine.

After these steps PDF indexing started working fine on my instance of Sitecore 7 running on Windows 8.1.



来源:https://stackoverflow.com/questions/17998725/sitecore-7-pdf-indexing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!