Conditional new Break for multi-column docx file, C#

浪子不回头ぞ 提交于 2019-12-12 04:16:10

问题


This is a follow-up question for Creating Word file from ObservableCollection with C#.
I have a .docx file with a Body that has 2 columns for its SectionProperties. I have a dictionary of foreign words with their translation. On each line I need [Word] = [Translation] and whenever a new letter starts it should be in its own line, with 2 or 3 line breaks before and after that letter, like this:

A



A-word = translation
A-word = translation



B



B-word = translation
B-word = translation
...

I structured this in a for loop, so that in every iteration I'm creating a new paragraph with a possible Run for the letter (if a new one starts), a Run for the word and a Run for the translation. So the Run with the first letter is in the same Paragraph as the word and translation Run and it appends 2 or 3 Break objects before and after the Text.
In doing so the second column can sometimes start with 1 or 2 empty lines. Or the first column on the next page can start with empty lines.
This is what I want to avoid.

So my question is, can I somehow check if the end of the page is reached, or the text is at the top of the column, so I don't have to add a Break? Or, can I format the Column itself so that it doesn't start with an empty line?

I have tried putting the letter Run in a separate, optional, Paragraph, but again, I find myself having to input line breaks and the problem remains.


回答1:


In the spirit of my other answer you can extend the template capability. Use the Productivity tool to generate a single page break object, something like:

private readonly Paragraph PageBreakPara = new Paragraph(new Run(new Break() { Type = BreakValues.Page}));

Make a helper method that finds containers of a text tag:

public IEnumerable FindElements(OpenXmlCompositeElement searchParent, string tagRegex) where T: OpenXmlElement { var regex = new Regex(tagRegex);

return searchParent.Descendants() 
    .Where(e=>(!(e is OpenXmlCompositeElement) 
              && regex.IsMatch(e.InnerText)))
    .SelectMany(e => 
        e.Ancestors()
            .OfType<T>()
            .Union(e is T ? new T[] { (T)e } : new T[] {} ))
    .ToList(); // can skip, prevents reevaluations 

}

And another one that duplicates a range from the document and deletes range:

public IEnumerable<T> DuplicateRange<T>(OpenXmlCompositeElement root, string tagRegex)
  where T: OpenXmlElement
{ 
// tagRegex must describe exactly two tags, such as [pageStart] and [pageEnd]
// or [page] [/page] - or whatever pattern you choose

  var tagElements = FindElements(root, tagRegex);
  var fromEl = tagElements.First();
  var toEl = tagElements.Skip(1).First(); // throws exception if less than 2 el

// you may want to find a common parent here
// I'll assume you've prepared the template so the elements are siblings.

  var result = new List<OpenXmlElement>();

  var step = fromEl.NextSibling();
  while (step !=null && toEl!=null && step!=toEl){
   // another method called DeleteRange will instead delete elements in that range within this loop
    var copy = step.CloneNode();
    toEl.InsertAfterSelf(copy);
    result.Add(copy);
    step = step.NextSibling();
  }

  return result;
}


public IEnumerable<OpenXmlElement> ReplaceTag(OpenXmlCompositeElement parent, string tagRegex, string replacement){
  var replaceElements = FindElements<OpenXmlElement>(parent, tagRegex);
  var regex = new Regex(tagRegex);
  foreach(var el in  replaceElements){
     el.InnerText = regex.Replace(el.InnerText, replacement);
  }

  return replaceElements;
}

Now you can have a document that looks like this:

[page] [TitleLetter]

[WordTemplate][Word]: [Translation] [/WordTemplate]

[pageBreak] [/page]

With that document you can duplicate the [page]..[/page] range, process it per letter and once you're out of letters - delete the template range:

var vocabulary = Dictionary>;

foreach (var letter in vocabulary.Keys.OrderByDescending(c=>c)){
  // in reverse order because the copy range comes after the template range
  var pageTemplate = DuplicateRange(wordDocument,"\\[/?page\\]");

  foreach (var p in pageTemplate.OfType<OpenXmlCompositeElement>()){

    ReplaceTag(p, "[TitleLetter]",""+letter);
    var pageBr = ReplaceTag(p, "[pageBreak]","");
    if (pageBr.Any()){
      foreach(var pbr in pageBr){
       pbr.InsertAfterSelf(PageBreakPara.CloneNode()); 
      }
    }
    var wordTemplateFound = FindElements(p, "\\[/?WordTemplate\\]");
    if (wordTemplateFound .Any()){
       foreach (var word in vocabulary[letter].Keys){
          var wordTemplate = DuplicateRange(p, "\\[/?WordTemplate\\]")
              .First(); // since it's a single paragraph template
          ReplaceTag(wordTemplate, "\\[/?WordTemplate\\]","");
          ReplaceTag(wordTemplate, "\\[Word]",word);
          ReplaceTag(wordTemplate, "\\[Translation\\]",vocabulary[letter][word]);
       }
    }
  }
}

...Or something like it.

  • Look into SdtElements if things start getting too complicated
  • Don't use AltChunk despite the popularity of that answer, it requires Word to open and process the file, so you can't use some library to make a PDF out of it
  • Word documents are messy, the solution above should work (haven't tested) but the template must be carefully crafted, make backups of your template often
  • making a robust document engine isn't easy (since Word is messy), do the minimum you need and rely on the template being in your control (not user-editable).
  • the code above is far from optimized or streamlined, I've tried to condense it in the smallest footprint possible at the cost of presentability. There are probably bugs too :)


来源:https://stackoverflow.com/questions/34614286/conditional-new-break-for-multi-column-docx-file-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!