Extract data from a web page that may not be formatted as a table

后端 未结 2 1129
一生所求
一生所求 2021-01-06 19:44

For starters I am by no means an expert in VBA. Just know enough to be dangerous 8).

I started out by doing a search on how to extract a table from a web page and sa

2条回答
  •  醉酒成梦
    2021-01-06 20:05

    So if written a small Sub which i think should solve your Problem if i understood you correctly. Of course you will invest some work, since it only reads one stage right now. But it reads the data from every Group:

    Option Explicit
    
    Private Sub CommandButton1_Click()
    
    'make sure you add references to Microsoft Internet Controls (shdocvw.dll) and
     'Microsoft HTML object Library.
     'Code will NOT run otherwise.
    
    Dim objIE As SHDocVw.InternetExplorer 'microsoft internet controls (shdocvw.dll)
    Dim htmlDoc As MSHTML.HTMLDocument 'Microsoft HTML Object Library
    Dim htmlInput As MSHTML.HTMLInputElement
    Dim htmlColl As MSHTML.IHTMLElementCollection
    
    Set objIE = New SHDocVw.InternetExplorer
    
    Dim htmlCurrentDoc As MSHTML.HTMLDocument 'Microsoft HTML Object Library
    
    Dim RowNumber As Integer
                RowNumber = 1
    
    With objIE
        .Navigate "http://worldoftanks.com/en/tournaments/1000000017/" ' Main page
        .Visible = 0
        Do While .READYSTATE <> 4: DoEvents: Loop
            Application.Wait (Now + TimeValue("0:00:01"))
    
    
            Set htmlDoc = .document
    
            Dim ButtonRoundData As Variant
            Set ButtonRoundData = htmlDoc.getElementsByClassName("group-stage_link")
    
            Dim ButtonData As Variant
            Set ButtonData = htmlDoc.getElementsByClassName("groups_link")
    
    
    
            Dim button As HTMLLinkElement
            For Each button In ButtonData
    
               Debug.Print button.nodeName
    
                button.Click
    
                   Application.Wait (Now + TimeValue("0:00:02")) ' This is to prevent double entryies but it is not clean. you should definitly check if the table is still the same and wait then
    
                Set htmlCurrentDoc = .document
                Dim RawData As HTMLTable
                Set RawData = htmlCurrentDoc.getElementsByClassName("tournament-table tournament-table__indent")(0)
    
    
    
                Dim ColumnNumber As Integer
                ColumnNumber = 1
    
                Dim hRow As HTMLTableRow
                Dim hCell As HTMLTableCell
                For Each hRow In RawData.Rows
    
                    For Each hCell In hRow.Cells
                        Cells(RowNumber, ColumnNumber).Value = hCell.innerText
                        ColumnNumber = ColumnNumber + 1
                    Next hCell
                    ColumnNumber = 1
                    RowNumber = RowNumber + 1
                Next hRow
    
                RowNumber = RowNumber + 3
            Next button
        End With
    
    End Sub
    

    What it does is starting an invisible IE, reads the data, clicks the button, reads the next and so on ...

    for Debugging i suggest to set .Visible to 1, so you will se what happens.

    EDIT 1: if you get a debbuging error, try to Abort and run it again, it definitly Needs some error handling, if the Website isn't loaded right.

    EDIT 2: Made it a bit stabler, you should really pay Attention, since the Webpage takes some time to load, you MUST check if the data has changed before writting it. if it hasn't changed wait a second or so and then try again.

    Here some sample data i got in Excel:

提交回复
热议问题