问题
I had to solve a little problem today (trimming trailing whitespace in a MS Word document that the PDF converter had added to each and every cell), and I quickly found out that this isn't possible using the standard Word interface, so wrote a small VBA script:
Sub TrimCellSpaces()
Dim itable As Table
Dim C As Cell
For Each itable In ThisDocument.Tables
For Each C In itable.Range.Cells
C.Range.Text = Trim(C.Range.Text)
Next
Next
End Sub
I was surprised that not only did this fail to remove the trailing spaces, it even added paragraph markers at the end of each cell. So I tried a regex approach:
Sub TrimCellSpaces()
Dim myRE As New RegExp
Dim itable As Table
Dim C As Cell
myRE.Pattern = "\s+$"
For Each itable In ThisDocument.Tables
For Each C In itable.Range.Cells
With myRE
C.Range.Text = .Replace(C.Range.Text, "")
End With
Next
Next
End Sub
Same result. I added a breakpoint, copied the value of C.Range.Text
(before replacement) into a hex editor and found that it ended in the hex sequence 0D 0D 07
(07
is the ASCII Bell character (!)).
I changed the regex to \s+(?!.*\w)
, and the script worked flawlessly. After the replace operation, the value of C.Range.Text
ended only in 0D 07
(one 0D
fewer).
I also tried this with a newly created table, not one generated by Word's PDF importer - same results.
What's going on here? Is Word using 0D 0D 07
as an "end of cell" marker? Or is it 0D 07
? Why did \s+
remove only one 0D
?
回答1:
All cells in Word end in ANSI 13 + ANSI 07 - it's the "end of cell" marker (a little "sunshine" if you have the display of non-printing characters turned on in the UI). Word uses this for structuring the table and storing cell-related information.
It's not possible to remove this character combination from the table cells - Word requires it. If you could remove it, the table would break. So Word simply prevents you from deleting them.
If you need table cell content as a text string you basically need to check the character codes of the last two characters and remove them before you use the string. You need to check the two characters because Microsoft changed the way text is returned from a cell a few versions back. Sometimes it returns only one of the characters, sometimes both, depending on how you pick up the information and which version of Word is involved.
来源:https://stackoverflow.com/questions/34879230/trim-doesnt-work-with-tables