Plone full text indexing Excel files

China☆狼群 提交于 2019-12-10 15:46:14

问题


how can I customize Plone search engine in order to actvate full text indexing of excel files? I have already installed pdftotext and wv for pdf, word files full text indexing.


回答1:


If you add Products.OpenXml to your instance eggs and install it in Plone you can index modern Office formats, at least .docx and .xlsx. For plain old Excel (.xls) files this does not work.

I tried it in a Plone 4.3.2 buildout config a few weeks ago:

[instance]
eggs =
    ...
    Products.OpenXml

[versions]
# You need a more recent lxml than default Plone, some 3.x version
lxml = 3.3.3
Products.OpenXml = 1.1.1

Alternatively or additionally, use Products.AROfficeTransforms. I have only tried it in combination with Products.OpenXml, but Products.AROfficeTransforms on its own is sufficient if you are only interested in old-style excel sheets, .xls. In a buildout config:

[instance]
eggs =
    ...
    Products.AROfficeTransforms

[versions]
Products.AROfficeTransforms = 0.11.0

It requires the xlhtml binary to be installed on your system. This is an ancient binary, last changed in 2002. I did not try to install it myself.




回答2:


Try ftw.tika

Supported formats:

  • Microsoft Office formats (Office Open XML)
  • *.docx Word Documents
  • *.dotx Word Templates
  • *.xlsx Excel Sheets
  • *.xltx Excel Templates
  • *.pptx Powerpoint Presentations
  • *.potx Powerpoint Templates
  • *.ppsx Powerpoint Slideshows
  • Legacy Microsoft Office (97) formats
  • Rich Text Format
  • OpenOffice ODF formats
  • OpenOffice 1.x formats
  • Common Adobe formats (InDesign, Illustrator, Photoshop)
  • PDF documents
  • WordPerfect documents E-Mail messages

It's based on apache tika and runs as a service managed by supervisor (You have to extend your buildout).

It's integrated with portal_transforms, is well tested and documented.

More infos:

  • Release on pypi


来源:https://stackoverflow.com/questions/23151319/plone-full-text-indexing-excel-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!