Webscrap VBA - List

余生长醉 提交于 2021-01-29 18:31:17

问题


I am trying to set up a webscrapping VBA code to import data into Excel from this website: https://www.thewindpower.net/windfarms_list_en.php

I wish to launch this webpage, select a country and then scrap the data from the table below (including url from the name column).

Yet, I am stuck with several points:

  • How can I select the country I wish in VBA code ?
  • How can I select the table as there is no id or class in the tag ?
  • How can I import the URL included in the name column ?

Here is the code I have already prepared (based on some research on the web:

Sub Grabdata()

'dimension (set aside memory for) our variables
Dim objIE As InternetExplorer
Dim ele As Object
Dim y As Integer

'start a new browser instance
Set objIE = New InternetExplorer
'make browser visible
objIE.Visible = True

'navigate to page with needed data
objIE.navigate "https://www.thewindpower.net/windfarms_list_en.php"
'wait for page to load
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

'we will output data to excel, starting on row 1
y = 1

'look at all the 'tr' elements in the 'table' with id 'myTable',
'and evaluate each, one at a time, using 'ele' variable
For Each ele In objIE.document.getElementById("myTable").getElementsByTagName("tr")
    'show the text content of 'tr' element being looked at
    Debug.Print ele.textContent
    'each 'tr' (table row) element contains 4 children ('td') elements
    'put text of 1st 'td' in col A
    Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
    'put text of 2nd 'td' in col B
    Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
    'put text of 3rd 'td' in col C
    Sheets("Sheet1").Range("C" & y).Value = ele.Children(2).textContent
    'put text of 4th 'td' in col D
    Sheets("Sheet1").Range("D" & y).Value = ele.Children(3).textContent
    'increment row counter by 1
    y = y + 1
'repeat until last ele has been evaluated
Next

'save the Excel workbook
ActiveWorkbook.Save

End Sub


回答1:


Most scrapeable pages will almost always have a static page layout so it's fairly safe to select elements using their index.

The code below selects the container element with id bloc_texte and then selects the second table inside.

If you're planning on doing a lot of requests as your comment suggests, you should add some code to slow down your requests (Application.wait type deal). Firing off request after request is likely to annoy the host.

' Required References
' Microsoft HTML Object Library
' Microsoft XML, v6.0

Sub Main()
    GetData ("GB")
End Sub

Sub GetData(ByVal Location As String)

Dim Request As MSXML2.ServerXMLHTTP60: Set Request = New MSXML2.ServerXMLHTTP60

Dim Result As HTMLDocument: Set Result = New HTMLDocument

Request.Open "POST", "https://www.thewindpower.net/windfarms_list_en.php", False
Request.setRequestHeader "content-type", "application/x-www-form-urlencoded"
Request.send "action=submit&pays=" & Location

Result.body.innerHTML = Request.responseText

Dim oRows As MSHTML.IHTMLElementCollection
Dim oRow As MSHTML.IHTMLElement

Dim oCells As MSHTML.IHTMLElementCollection
Dim oCell As MSHTML.IHTMLElement

Dim oLinks As MSHTML.IHTMLElementCollection

Set oRows = Result.getElementById("bloc_texte").getElementsByTagName("table")(2).getElementsByTagName("tr")

Dim iRow As Integer 'output row counter
Dim iColumn As Integer 'output column counter
Dim Sheet As Worksheet 'output sheet

Set Sheet = ThisWorkbook.Worksheets("Sheet1")
iRow = 1
iColumn = 1

For Each oRow In oRows
    If Not oRow.className = "puce_texte" Then
        Set oCells = oRow.getElementsByTagName("td")
        For Each oCell In oCells
            Set oLinks = oCell.getElementsByTagName("a")
            If oLinks.Length = 0 Then
                Sheet.Cells(iRow, iColumn).Value = oCell.innerText
            Else
                Sheet.Cells(iRow, iColumn).Value = Replace(oLinks(0).getAttribute("href"), "about:", "")
            End If
            iColumn = iColumn + 1
        Next oCell
        iRow = iRow + 1
        iColumn = 1
    End If
Next oRow

End Sub


来源:https://stackoverflow.com/questions/62759517/webscrap-vba-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!