问题
This is a follow-up question for Creating Word file from ObservableCollection with C#.
I have a .docx file with a Body
that has 2 columns for its SectionProperties
. I have a dictionary of foreign words with their translation. On each line I need [Word] = [Translation] and whenever a new letter starts it should be in its own line, with 2 or 3 line breaks before and after that letter, like this:
A
A-word = translation
A-word = translationB
B-word = translation
B-word = translation
...
I structured this in a for loop, so that in every iteration I'm creating a new paragraph with a possible Run
for the letter (if a new one starts), a Run
for the word and a Run
for the translation. So the Run
with the first letter is in the same Paragraph
as the word and translation Run
and it appends 2 or 3 Break
objects before and after the Text
.
In doing so the second column can sometimes start with 1 or 2 empty lines. Or the first column on the next page can start with empty lines.
This is what I want to avoid.
So my question is, can I somehow check if the end of the page is reached, or the text is at the top of the column, so I don't have to add a Break
? Or, can I format the Column
itself so that it doesn't start with an empty line?
I have tried putting the letter Run
in a separate, optional, Paragraph
, but again, I find myself having to input line breaks and the problem remains.
回答1:
In the spirit of my other answer you can extend the template capability. Use the Productivity tool to generate a single page break object, something like:
private readonly Paragraph PageBreakPara = new Paragraph(new Run(new Break() { Type = BreakValues.Page}));
Make a helper method that finds containers of a text tag:
public IEnumerable FindElements(OpenXmlCompositeElement searchParent, string tagRegex) where T: OpenXmlElement { var regex = new Regex(tagRegex);
return searchParent.Descendants()
.Where(e=>(!(e is OpenXmlCompositeElement)
&& regex.IsMatch(e.InnerText)))
.SelectMany(e =>
e.Ancestors()
.OfType<T>()
.Union(e is T ? new T[] { (T)e } : new T[] {} ))
.ToList(); // can skip, prevents reevaluations
}
And another one that duplicates a range from the document and deletes range:
public IEnumerable<T> DuplicateRange<T>(OpenXmlCompositeElement root, string tagRegex)
where T: OpenXmlElement
{
// tagRegex must describe exactly two tags, such as [pageStart] and [pageEnd]
// or [page] [/page] - or whatever pattern you choose
var tagElements = FindElements(root, tagRegex);
var fromEl = tagElements.First();
var toEl = tagElements.Skip(1).First(); // throws exception if less than 2 el
// you may want to find a common parent here
// I'll assume you've prepared the template so the elements are siblings.
var result = new List<OpenXmlElement>();
var step = fromEl.NextSibling();
while (step !=null && toEl!=null && step!=toEl){
// another method called DeleteRange will instead delete elements in that range within this loop
var copy = step.CloneNode();
toEl.InsertAfterSelf(copy);
result.Add(copy);
step = step.NextSibling();
}
return result;
}
public IEnumerable<OpenXmlElement> ReplaceTag(OpenXmlCompositeElement parent, string tagRegex, string replacement){
var replaceElements = FindElements<OpenXmlElement>(parent, tagRegex);
var regex = new Regex(tagRegex);
foreach(var el in replaceElements){
el.InnerText = regex.Replace(el.InnerText, replacement);
}
return replaceElements;
}
Now you can have a document that looks like this:
[page] [TitleLetter]
[WordTemplate][Word]: [Translation] [/WordTemplate]
[pageBreak] [/page]
With that document you can duplicate the [page]..[/page] range, process it per letter and once you're out of letters - delete the template range:
var vocabulary = Dictionary>;
foreach (var letter in vocabulary.Keys.OrderByDescending(c=>c)){
// in reverse order because the copy range comes after the template range
var pageTemplate = DuplicateRange(wordDocument,"\\[/?page\\]");
foreach (var p in pageTemplate.OfType<OpenXmlCompositeElement>()){
ReplaceTag(p, "[TitleLetter]",""+letter);
var pageBr = ReplaceTag(p, "[pageBreak]","");
if (pageBr.Any()){
foreach(var pbr in pageBr){
pbr.InsertAfterSelf(PageBreakPara.CloneNode());
}
}
var wordTemplateFound = FindElements(p, "\\[/?WordTemplate\\]");
if (wordTemplateFound .Any()){
foreach (var word in vocabulary[letter].Keys){
var wordTemplate = DuplicateRange(p, "\\[/?WordTemplate\\]")
.First(); // since it's a single paragraph template
ReplaceTag(wordTemplate, "\\[/?WordTemplate\\]","");
ReplaceTag(wordTemplate, "\\[Word]",word);
ReplaceTag(wordTemplate, "\\[Translation\\]",vocabulary[letter][word]);
}
}
}
}
...Or something like it.
- Look into SdtElements if things start getting too complicated
- Don't use AltChunk despite the popularity of that answer, it requires Word to open and process the file, so you can't use some library to make a PDF out of it
- Word documents are messy, the solution above should work (haven't tested) but the template must be carefully crafted, make backups of your template often
- making a robust document engine isn't easy (since Word is messy), do the minimum you need and rely on the template being in your control (not user-editable).
- the code above is far from optimized or streamlined, I've tried to condense it in the smallest footprint possible at the cost of presentability. There are probably bugs too :)
来源:https://stackoverflow.com/questions/34614286/conditional-new-break-for-multi-column-docx-file-c-sharp