Scraping text from file within HTML tags

泄露秘密 提交于 2019-11-29 08:51:36
Dick Kusleika

If you're using Excel VBA, set a reference (Tools - References) to the MSHTML library (entitled Microsoft HTML Object Library in the reference menu)

Sub ScrapeDateAbbr()

    Dim hDoc As MSHTML.HTMLDocument
    Dim hElem As MSHTML.HTMLGenericElement
    Dim sFile As String, lFile As Long
    Dim sHtml As String

    'read in the file
    lFile = FreeFile
    sFile = "C:/Users/dick/Documents/My Dropbox/Excel/Testabbr.html"
    Open sFile For Input As lFile
    sHtml = Input$(LOF(lFile), lFile)

    'put into an htmldocument object
    Set hDoc = New MSHTML.HTMLDocument
    hDoc.body.innerHTML = sHtml

    'loop through abbr tags
    For Each hElem In hDoc.getElementsByTagName("abbr")
        'only those that have a data-utime attribute
        If Len(hElem.getAttribute("data-utime")) > 0 Then
            'get the title attribute
            Debug.Print hElem.getAttribute("title")
        End If
    Next hElem

End Sub

I assumed the file was local since you called in a source file. If you need to download it first, you'd need another reference to MSXML and this code

Sub ScrapeDateAbbrDownload()

    Dim xHttp As MSXML2.XMLHTTP
    Dim hDoc As MSHTML.HTMLDocument
    Dim hElem As MSHTML.HTMLGenericElement

    Set xHttp = New MSXML2.XMLHTTP
    xHttp.Open "GET", "file:///C:/Users/dick/Documents/My%20Dropbox/Excel/Testabbr.html"
    xHttp.send

    Do
        DoEvents
    Loop Until xHttp.readyState = 4

    'put into an htmldocument object
    Set hDoc = New MSHTML.HTMLDocument
    hDoc.body.innerHTML = xHttp.responseText

    'loop through abbr tags
    For Each hElem In hDoc.getElementsByTagName("abbr")
        'only those that have a data-utime attribute
        If Len(hElem.getAttribute("data-utime")) > 0 Then
            'get the title attribute
            Debug.Print hElem.getAttribute("title")
        End If
    Next hElem

End Sub

if you're using Java you could use Jsoup. This is unclear from your question, please elaborate on what exactly you are trying to do

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!