Rich text format (with formatting tags) in Excel to unformatted text

后端 未结 4 1288
天涯浪人
天涯浪人 2020-12-10 20:32

I have approx. 12000 cells in excel containing RTF (including formatting tags). I need to parse them to get to the unformatted text.

This is the example of one of th

4条回答
  •  鱼传尺愫
    2020-12-10 21:20

    You can try to parse every cell with regular expression and leave only the content you need.

    Every RTF control code start with "\" and ends with space, without any additional space between. "{}" are use for grouping. If your text won't contain any, you can just remove them (the same for ";"). So now you stay with your text and some unnecessary words as "Arial", "Normal" etc. You can build the dictionary to remove them also. After some tweaking, you will stay with only the text you need.

    Look at http://www.regular-expressions.info/ for more information and great tool to write RegExp's (RegexBuddy - unfortunately it isn't free, but it's worth the money. AFAIR there is also trial).

    UPDATE: Of course, I don't encourage you to do it manually for every cell. Just iterate through active range: Refer this thread: SO: About iterating through cells in VBA

    Personally, I'll give a try to this idea:

    Sub Iterate()
       For Each Cell in ActiveSheet.UsedRange.Cells
          'Do something
       Next
    End Sub
    

    And how to use RegExp's in VBA (Excel)?

    Refer: Regex functions in Excel and Regex in VBA

    Basically you've to use VBScript.RegExp object through COM.

提交回复
热议问题