How do I read a Word doc using the StreamReader?

て烟熏妆下的殇ゞ 提交于 2019-12-24 21:43:23

问题


I have n asp.net 2.0 app. I am trying to upload a file and read lines and display them in a textbox. This works fine for a .txt file. But if I do a word doc, I get all kinds of jibberish (looks like xml-based formatting) surroudning the text. Here is my code...

    Dim s As New StringBuilder
    Dim rdr As StreamReader

    If FileUpload1.HasFile Then

        rdr = New StreamReader(FileUpload1.FileContent)

        Do Until rdr.EndOfStream
            s.Append(rdr.ReadLine() & ControlChars.NewLine)
        Loop

        TextBox1.Text = s.toString()

    End If

回答1:


StreamReader doesn't support Word-formatted files. It just reads streams of characters. You need to use some kind of specifically-Word-capable library. This isn't an easy problem at all - it's not always clear how you would convert any portion of a Word document into plaintext.




回答2:


But if I do a word doc, I get all kinds of jibberish (looks like xml-based formatting) surroudning the text.

That's because the Word document file contains that xml-based formatting. You will see the same thing, if you use a dumb text reader (e.g. Notepad.exe, or e.g. type from the command-line) to see what's in the file.

To extract the text from the surrounding formatting, you'll need to use software (e.g. Word itself, winword.exe) to save or get the document in plain-text format.




回答3:


You can use the"Word.ApplicationClass" class

However you should read Considerations for server-side Automation of Office

Liberated from another donor:

 Word.ApplicationClass wordApp=new ApplicationClass();

    object file=path;

    object nullobj=System.Reflection.Missing.Value;  

    Word.Document doc = wordApp.Documents.Open(

    ref file, ref nullobj, ref nullobj,

                                          ref nullobj, ref nullobj, ref nullobj,

                                          ref nullobj, ref nullobj, ref nullobj,

                                          ref nullobj, ref nullobj, ref nullobj);

    doc.ActiveWindow.Selection.WholeStory();

    doc.ActiveWindow.Selection.Copy();

    IDataObject data=Clipboard.GetDataObject();

    txtFileContent.Text=data.GetData(DataFormats.Text).ToString();

    doc.Close();

As mentioned in my comment below this may work for you as ell: http://npoi.codeplex.com/



来源:https://stackoverflow.com/questions/1313247/how-do-i-read-a-word-doc-using-the-streamreader

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!