问题
I have n asp.net 2.0 app. I am trying to upload a file and read lines and display them in a textbox. This works fine for a .txt file. But if I do a word doc, I get all kinds of jibberish (looks like xml-based formatting) surroudning the text. Here is my code...
Dim s As New StringBuilder
Dim rdr As StreamReader
If FileUpload1.HasFile Then
rdr = New StreamReader(FileUpload1.FileContent)
Do Until rdr.EndOfStream
s.Append(rdr.ReadLine() & ControlChars.NewLine)
Loop
TextBox1.Text = s.toString()
End If
回答1:
StreamReader doesn't support Word-formatted files. It just reads streams of characters. You need to use some kind of specifically-Word-capable library. This isn't an easy problem at all - it's not always clear how you would convert any portion of a Word document into plaintext.
回答2:
But if I do a word doc, I get all kinds of jibberish (looks like xml-based formatting) surroudning the text.
That's because the Word document file contains that xml-based formatting. You will see the same thing, if you use a dumb text reader (e.g. Notepad.exe
, or e.g. type
from the command-line) to see what's in the file.
To extract the text from the surrounding formatting, you'll need to use software (e.g. Word itself, winword.exe
) to save or get the document in plain-text format.
回答3:
You can use the"Word.ApplicationClass" class
However you should read Considerations for server-side Automation of Office
Liberated from another donor:
Word.ApplicationClass wordApp=new ApplicationClass();
object file=path;
object nullobj=System.Reflection.Missing.Value;
Word.Document doc = wordApp.Documents.Open(
ref file, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj);
doc.ActiveWindow.Selection.WholeStory();
doc.ActiveWindow.Selection.Copy();
IDataObject data=Clipboard.GetDataObject();
txtFileContent.Text=data.GetData(DataFormats.Text).ToString();
doc.Close();
As mentioned in my comment below this may work for you as ell: http://npoi.codeplex.com/
来源:https://stackoverflow.com/questions/1313247/how-do-i-read-a-word-doc-using-the-streamreader