Need to extract text messages out of an HTML document

ⅰ亾dé卋堺 提交于 2019-12-25 01:18:19

问题


Hello, I have a long HTML document, this is only the part that interests me:

<iframe class="goog-te-menu-frame skiptranslate" src="javascript:void(0)" frameborder="0" style="display: none; visibility: visible;"></iframe><div class="chatbox3"><div class="chatbox2"><div class="chatbox"><div class="logwrapper" style="top: 89px; margin-right: 168px;"><div class="logbox"><div style="position: relative; min-height: 100%;"><div class="logitem"><p class="statuslog">You're now chatting with a random stranger. Say hi!</p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>hii there</span></p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>nice to meet you</span></p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>this is a text</span></p></div><div class="logitem"><p class="youmsg"><strong class="msgsource">You:</strong> <span>this text should not be taken</span></p></div><div class="logitem"><p class="statuslog">Stranger has disconnected.</p></div><div class="logitem"><div class="statuslog">

It outputs as follows:

You're now chatting with a random stranger. Say hi!

Stranger: hii thereStranger: nice to meet youStranger: this is a textYou: this text should not be takenStranger has disconnected.

I want to extract all messages sent by Stranger into strings (Visual Basic), and ignore messages sent by me and system messages such as You are now chatting with a random stranger. Sai hi! and Stranger has disconnected. I have no idea on how I should approach this and need help, thank you.


回答1:


If anyone else is interested in such an operation, I've managed to simplify the process by applying the HTML code to another webbrowser then using the Document.Body.InnerHtml property to get the text output in a richtextbox, so I can easily deal with the text instead of dealing with the HTML code.

OmegleHTML.Text = Omegle.Document.Body.InnerHtml
WebBrowser1.Document.Body.InnerHtml = OmegleHTML.Text
Log.Text = WebBrowser1.Document.Body.OuterText

I've also used the following code to get rid of any irrelevant text before the chat log:

Dim SInd, Eind As Integer
SInd = 0
Eind = Log.Text.IndexOf("You're now chatting with a random stranger. Say hi!")
Log.Text = Log.Text.Remove(SInd, Eind)

This is the closest I've got. If you have a better answer, please post it.



来源:https://stackoverflow.com/questions/28934060/need-to-extract-text-messages-out-of-an-html-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!