How to upload docx, xlsx & txt files to Marklogic Server?

和自甴很熟 提交于 2019-12-25 05:01:36

问题


I have a folder which contains doc, docx, xlsx, pdf and txt files. I am uploading all these files into Marklogic with this XQuery:-

for $d in xdmp:filesystem-directory("C:\uploads")//dir:entry
return 
  xdmp:document-load($d//dir:pathname,
    <options xmlns="xdmp:document-load">
    <uri>{concat("/documents/", string($d//dir:filename))}</uri>
    <permissions>{xdmp:default-permissions()}</permissions>
    <collections>{xdmp:default-collections()}</collections>
    <format>binary</format>
    </options>)

I have also installed content processing for my database. Now when I upload doc and pdf files they get converted to xml & xhtml files. But docx, xlsx, & txt do not get converted. Can somebody tell me why these files are not getting converted?


回答1:


Enable the Office OpenXML Extract pipeline to convert the .docx, .xlsx, and .pptx files.

Files with these extensions are already XML. If you were to change their extension to .zip, you could extract and see the files are just composed of interrelated XML parts.

The Office OpenXML Extract pipeline will unzip Office 2007/2010 files and store their requisite parts in a directory sibling to the main file, similar to the other conversion pipelines. This pipeline allows you to store the raw Open XML. There is no further conversion to XHTML of DocBook at this time.

There is no conversion for .txt that I'm aware of. Those are just text files and will be inserted as text in MarkLogic. You could convert to XML by simply wrapping the text in a parent element and changing the file extension to .xml.

Hope this helps.



来源:https://stackoverflow.com/questions/11244942/how-to-upload-docx-xlsx-txt-files-to-marklogic-server

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!