I have approx. 12000 cells in excel containing RTF (including formatting tags). I need to parse them to get to the unformatted text.
This is the example of one of th
You can try to parse every cell with regular expression and leave only the content you need.
Every RTF control code start with "\" and ends with space, without any additional space between. "{}" are use for grouping. If your text won't contain any, you can just remove them (the same for ";"). So now you stay with your text and some unnecessary words as "Arial", "Normal" etc. You can build the dictionary to remove them also. After some tweaking, you will stay with only the text you need.
Look at http://www.regular-expressions.info/ for more information and great tool to write RegExp's (RegexBuddy - unfortunately it isn't free, but it's worth the money. AFAIR there is also trial).
UPDATE: Of course, I don't encourage you to do it manually for every cell. Just iterate through active range: Refer this thread: SO: About iterating through cells in VBA
Personally, I'll give a try to this idea:
Sub Iterate()
For Each Cell in ActiveSheet.UsedRange.Cells
'Do something
Next
End Sub
And how to use RegExp's in VBA (Excel)?
Refer: Regex functions in Excel and Regex in VBA
Basically you've to use VBScript.RegExp object through COM.