问题
I am trying to set up a webscrapping VBA code to import data into Excel from this website: https://www.thewindpower.net/windfarms_list_en.php
I wish to launch this webpage, select a country and then scrap the data from the table below (including url from the name column).
Yet, I am stuck with several points:
- How can I select the country I wish in VBA code ?
- How can I select the table as there is no id or class in the tag ?
- How can I import the URL included in the name column ?
Here is the code I have already prepared (based on some research on the web:
Sub Grabdata()
'dimension (set aside memory for) our variables
Dim objIE As InternetExplorer
Dim ele As Object
Dim y As Integer
'start a new browser instance
Set objIE = New InternetExplorer
'make browser visible
objIE.Visible = True
'navigate to page with needed data
objIE.navigate "https://www.thewindpower.net/windfarms_list_en.php"
'wait for page to load
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'we will output data to excel, starting on row 1
y = 1
'look at all the 'tr' elements in the 'table' with id 'myTable',
'and evaluate each, one at a time, using 'ele' variable
For Each ele In objIE.document.getElementById("myTable").getElementsByTagName("tr")
'show the text content of 'tr' element being looked at
Debug.Print ele.textContent
'each 'tr' (table row) element contains 4 children ('td') elements
'put text of 1st 'td' in col A
Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
'put text of 2nd 'td' in col B
Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
'put text of 3rd 'td' in col C
Sheets("Sheet1").Range("C" & y).Value = ele.Children(2).textContent
'put text of 4th 'td' in col D
Sheets("Sheet1").Range("D" & y).Value = ele.Children(3).textContent
'increment row counter by 1
y = y + 1
'repeat until last ele has been evaluated
Next
'save the Excel workbook
ActiveWorkbook.Save
End Sub
回答1:
Most scrapeable pages will almost always have a static page layout so it's fairly safe to select elements using their index.
The code below selects the container element with id bloc_texte
and then selects the second table inside.
If you're planning on doing a lot of requests as your comment suggests, you should add some code to slow down your requests (Application.wait
type deal). Firing off request after request is likely to annoy the host.
' Required References
' Microsoft HTML Object Library
' Microsoft XML, v6.0
Sub Main()
GetData ("GB")
End Sub
Sub GetData(ByVal Location As String)
Dim Request As MSXML2.ServerXMLHTTP60: Set Request = New MSXML2.ServerXMLHTTP60
Dim Result As HTMLDocument: Set Result = New HTMLDocument
Request.Open "POST", "https://www.thewindpower.net/windfarms_list_en.php", False
Request.setRequestHeader "content-type", "application/x-www-form-urlencoded"
Request.send "action=submit&pays=" & Location
Result.body.innerHTML = Request.responseText
Dim oRows As MSHTML.IHTMLElementCollection
Dim oRow As MSHTML.IHTMLElement
Dim oCells As MSHTML.IHTMLElementCollection
Dim oCell As MSHTML.IHTMLElement
Dim oLinks As MSHTML.IHTMLElementCollection
Set oRows = Result.getElementById("bloc_texte").getElementsByTagName("table")(2).getElementsByTagName("tr")
Dim iRow As Integer 'output row counter
Dim iColumn As Integer 'output column counter
Dim Sheet As Worksheet 'output sheet
Set Sheet = ThisWorkbook.Worksheets("Sheet1")
iRow = 1
iColumn = 1
For Each oRow In oRows
If Not oRow.className = "puce_texte" Then
Set oCells = oRow.getElementsByTagName("td")
For Each oCell In oCells
Set oLinks = oCell.getElementsByTagName("a")
If oLinks.Length = 0 Then
Sheet.Cells(iRow, iColumn).Value = oCell.innerText
Else
Sheet.Cells(iRow, iColumn).Value = Replace(oLinks(0).getAttribute("href"), "about:", "")
End If
iColumn = iColumn + 1
Next oCell
iRow = iRow + 1
iColumn = 1
End If
Next oRow
End Sub
来源:https://stackoverflow.com/questions/62759517/webscrap-vba-list