How to access OpenXML content by page number?

前端 未结 4 1378
悲&欢浪女
悲&欢浪女 2020-12-03 22:57

Using OpenXML, can I read the document content by page number?

wordDocument.MainDocumentPart.Document.Body gives content of full document.



        
4条回答
  •  借酒劲吻你
    2020-12-03 23:14

    You cannot reference OOXML content via page numbering at the OOXML data level alone.

    • Hard page breaks are not the problem; hard page breaks can be counted.
    • Soft page breaks are the problem. These are calculated according to line break and pagination algorithms which are implementation dependent; it is not intrinsic to the OOXML data. There is nothing to count.

    What about w:lastRenderedPageBreak, which is a record of the position of a soft page break at the time the document was last rendered? No, w:lastRenderedPageBreak does not help in general either because:

    • By definition, w:lastRenderedPageBreak position is stale when content has been changed since last opened by a program that paginates its content.
    • In MS Word's implementation, w:lastRenderedPageBreak is known to be unreliable in various circumstances including
      1. when table spans two pages
      2. when next page starts with an empty paragraph
      3. for multi-column layouts with text boxes starting a new column
      4. for large images or long sequences of blank lines

    If you're willing to accept a dependence on Word Automation, with all of its inherent licensing and server operation limitations, then you have a chance of determining page boundaries, page numberings, page counts, etc.

    Otherwise, the only real answer is to move beyond page-based referencing frameworks that are dependent upon proprietary, implementation-specific pagination algorithms.

提交回复
热议问题