Conditional new Break for multi-column docx file, C#

问题

This is a follow-up question for Creating Word file from ObservableCollection with C#.
I have a .docx file with a Body that has 2 columns for its SectionProperties. I have a dictionary of foreign words with their translation. On each line I need [Word] = [Translation] and whenever a new letter starts it should be in its own line, with 2 or 3 line breaks before and after that letter, like this:

A

A-word = translation
A-word = translation

B

B-word = translation
B-word = translation
...

I structured this in a for loop, so that in every iteration I'm creating a new paragraph with a possible Run for the letter (if a new one starts), a Run for the word and a Run for the translation. So the Run with the first letter is in the same Paragraph as the word and translation Run and it appends 2 or 3 Break objects before and after the Text.
In doing so the second column can sometimes start with 1 or 2 empty lines. Or the first column on the next page can start with empty lines.
This is what I want to avoid.

So my question is, can I somehow check if the end of the page is reached, or the text is at the top of the column, so I don't have to add a Break? Or, can I format the Column itself so that it doesn't start with an empty line?

I have tried putting the letter Run in a separate, optional, Paragraph, but again, I find myself having to input line breaks and the problem remains.

回答1:

In the spirit of my other answer you can extend the template capability. Use the Productivity tool to generate a single page break object, something like:

private readonly Paragraph PageBreakPara = new Paragraph(new Run(new Break() { Type = BreakValues.Page}));

Make a helper method that finds containers of a text tag:

public IEnumerable FindElements(OpenXmlCompositeElement searchParent, string tagRegex) where T: OpenXmlElement { var regex = new Regex(tagRegex);

return searchParent.Descendants() 
    .Where(e=>(!(e is OpenXmlCompositeElement) 
              && regex.IsMatch(e.InnerText)))
    .SelectMany(e => 
        e.Ancestors()
            .OfType<T>()
            .Union(e is T ? new T[] { (T)e } : new T[] {} ))
    .ToList(); // can skip, prevents reevaluations

}

And another one that duplicates a range from the document and deletes range:

public IEnumerable<T> DuplicateRange<T>(OpenXmlCompositeElement root, string tagRegex)
  where T: OpenXmlElement
{ 
// tagRegex must describe exactly two tags, such as [pageStart] and [pageEnd]
// or [page] [/page] - or whatever pattern you choose

  var tagElements = FindElements(root, tagRegex);
  var fromEl = tagElements.First();
  var toEl = tagElements.Skip(1).First(); // throws exception if less than 2 el

// you may want to find a common parent here
// I'll assume you've prepared the template so the elements are siblings.

  var result = new List<OpenXmlElement>();

  var step = fromEl.NextSibling();
  while (step !=null && toEl!=null && step!=toEl){
   // another method called DeleteRange will instead delete elements in that range within this loop
    var copy = step.CloneNode();
    toEl.InsertAfterSelf(copy);
    result.Add(copy);
    step = step.NextSibling();
  }

  return result;
}


public IEnumerable<OpenXmlElement> ReplaceTag(OpenXmlCompositeElement parent, string tagRegex, string replacement){
  var replaceElements = FindElements<OpenXmlElement>(parent, tagRegex);
  var regex = new Regex(tagRegex);
  foreach(var el in  replaceElements){
     el.InnerText = regex.Replace(el.InnerText, replacement);
  }

  return replaceElements;
}

Now you can have a document that looks like this:

[page] [TitleLetter]

[WordTemplate][Word]: [Translation] [/WordTemplate]

[pageBreak] [/page]

With that document you can duplicate the [page]..[/page] range, process it per letter and once you're out of letters - delete the template range:

var vocabulary = Dictionary>;

foreach (var letter in vocabulary.Keys.OrderByDescending(c=>c)){
  // in reverse order because the copy range comes after the template range
  var pageTemplate = DuplicateRange(wordDocument,"\\[/?page\\]");

  foreach (var p in pageTemplate.OfType<OpenXmlCompositeElement>()){

    ReplaceTag(p, "[TitleLetter]",""+letter);
    var pageBr = ReplaceTag(p, "[pageBreak]","");
    if (pageBr.Any()){
      foreach(var pbr in pageBr){
       pbr.InsertAfterSelf(PageBreakPara.CloneNode()); 
      }
    }
    var wordTemplateFound = FindElements(p, "\\[/?WordTemplate\\]");
    if (wordTemplateFound .Any()){
       foreach (var word in vocabulary[letter].Keys){
          var wordTemplate = DuplicateRange(p, "\\[/?WordTemplate\\]")
              .First(); // since it's a single paragraph template
          ReplaceTag(wordTemplate, "\\[/?WordTemplate\\]","");
          ReplaceTag(wordTemplate, "\\[Word]",word);
          ReplaceTag(wordTemplate, "\\[Translation\\]",vocabulary[letter][word]);
       }
    }
  }
}

...Or something like it.

Look into SdtElements if things start getting too complicated
Don't use AltChunk despite the popularity of that answer, it requires Word to open and process the file, so you can't use some library to make a PDF out of it
Word documents are messy, the solution above should work (haven't tested) but the template must be carefully crafted, make backups of your template often
making a robust document engine isn't easy (since Word is messy), do the minimum you need and rely on the template being in your control (not user-editable).
the code above is far from optimized or streamlined, I've tried to condense it in the smallest footprint possible at the cost of presentability. There are probably bugs too :)

来源：https://stackoverflow.com/questions/34614286/conditional-new-break-for-multi-column-docx-file-c-sharp

标签

multiple-columns

docx