How to associate search catalog file (.pdx) with PDF document

半腔热情 提交于 2019-12-25 03:22:55

问题


Using a .NET application, I am trying to create a PDF "table of contents" that references other files, like one would distribute on a DVD etc.

For this purpose, I need a search index and catalog, so full-text search will work across documents. I have been able to automate the construction of the index by copying an "old" .pdx file (the directory structure is always the same) and then calling JavaScript from C#:

var js = $@"catalog.getIndex(""{pdxFilePath}"").build('alert(""Hello"")', true)";

formFields.ExecuteThisJavascript(js);

But how can I associate the .pdx file with my .pdf document, so it gets loaded automatically?

In Acrobat, this is set in the "advanced" document properties:

However, this is not accessible via the info or metadata properties of the document. Apparently this is stored somewhere else, but I don't know enough about the PDF format to figure out how to access this data:

Any help would be highly appreciated. I could use both the Adobe SDK/JavaScript API or some other library (for instance, I know we already have an Aspose license).


回答1:


/Search entry is not documented in PDF specification, probably is it an Adobe extension.
You can use any library that supports low level COS objects (dictionaries, strings, numbers, streams, etc) but since the entry is not documented, you can only infer its structure from sample PDF files.




回答2:


Answering my own question here... I was able to solve this using PdfSharp.

The following code is compatible with PdfSharp 1.50.4845-RC2a.

pdxFile should be the name of the .pdx file including the file extension (e.g. "catalog.pdx"). I have only tested this with .pdx files located in the same folder as the PDF document, but I would assume that relative paths in general should work.

No guarantees that this is a perfect solution as I lack a deeper understanding of the PDF format, but this seems to work at least.

    private void SetSearchCatalog(PdfDocument doc, string pdxFile)
    {
        var indexDict = new PdfDictionary(doc);
        indexDict.Elements["/F"] = new PdfString(pdxFile, PdfStringEncoding.RawEncoding);
        indexDict.Elements["/Type"] = new PdfName("/Filespec");

        var indexArrayItemDict = new PdfDictionary(doc);
        indexArrayItemDict.Elements["/Index"] = indexDict;
        indexArrayItemDict.Elements["/Name"] = new PdfName("/PDX");

        var indexArray = new PdfArray(doc, indexArrayItemDict);

        var searchDict = new PdfDictionary(doc);
        searchDict.Elements["/Indexes"] = indexArray;

        doc.Internals.Catalog.Elements["/Search"] = searchDict;
    }


来源:https://stackoverflow.com/questions/51127552/how-to-associate-search-catalog-file-pdx-with-pdf-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!