Greyhound data import to excel macro formula

无人久伴 提交于 2020-01-16 05:47:06

问题


as part of a research project i need to extract as much data as possible from a webpage. The problem is to access each table i have to follow lots of links which I can't get to work automatically.

Its from a greyhound-data.com. So an example would be I want to extract all the racing stats for every dog that raced in swindon between 1st Jan 2017- 28th Feb 2018. When i put it in the search engine I get 57236 races in a table. I have to follow the link on name of race for each race..

http://www.greyhound-data.com/d?racename=&country=13000&startmonth=3&endmonth=2&startdate=2017&enddate=2018&maxdist=unlimitied&class=any&order=dateD&x=2

My biggest problem is I don't know how to say follow the various links. And I don't know how to loop multiple times - once for each of the races in the original list.

I have created the simple Macro query :

Sub GetData()

Dim IE As Object
Dim doc As Object
Dim strURL As String
Dim I As Integer

For I = 1 To 9
strURL = "http://www.greyhound-data.com/d?racename=&country=13000&startmonth=3&endmonth=2&startdate=2017&enddate=2018&maxdist=unlimitied&class=any&order=dateD&x=" + Trim(Str(I))

Set IE = CreateObject("InternetExplorer.Application")
With IE

.navigate strURL
Do Until .ReadyState = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
Set doc = IE.Document
GetAllTables doc

.Quit

End With
Next I

End Sub

Sub GetAllTables(doc As Object)

Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim tabno As Long
Dim nextrow As Long
Dim I As Long

Set ws = Worksheets.Add

For Each tbl In doc.getElementsByTagName("TABLE")
tabno = tabno + 1
nextrow = nextrow + 1
Set rng = ws.Range("B" & nextrow)
rng.Offset(, -1) = "Table " & tabno
For Each rw In tbl.Rows
For Each cl In rw.Cells
rng.Value = cl.outerText
Set rng = rng.Offset(, 1)
I = I + 1
Next cl
nextrow = nextrow + 1
Set rng = rng.Offset(1, -I)
I = 0
Next rw
Next tbl

End Sub

it retrieves the all races data from the url automatically. But can not make the next step. on each page there is a "name of the race" tab and I need to get all the data on each page for each row. It is because I need to get the information of 1st place, 2nd place and third place.

Thanks for your time I know its a bit garbled!!

My new code after changes is looking like this:

Sub GetData()

Dim IE As Object
Dim doc As Object
Dim strURL As String
Dim I As Integer

For I = 1 To 9
strURL = "http://www.greyhound-data.com/d?racename=&country=13000&startmonth=3&endmonth=2&startdate=2017&enddate=2018&maxdist=unlimitied&class=any&order=dateD&x=" + Trim(Str(I))

Set IE = CreateObject("InternetExplorer.Application")
With IE

.navigate strURL
Do Until .ReadyState = 4: DoEvents: Loop
Do While .Busy: DoEvents: Loop
Set doc = IE.Document
GetAllTables doc

.Quit

End With
Next I

End Sub


Sub GetAllTables(doc As Object)

Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim tabno As Long
Dim nextrow As Long
Dim I As Long
Dim ThisLink As Object 'variable for <a> tags
Set ws = Worksheets.Add

For Each tbl In doc.getElementsByTagName("TABLE")
    tabno = tabno + 1
    nextrow = nextrow + 1
    Set rng = ws.Range("B" & nextrow)

    rng.Offset(, -1) = "Table " & tabno
    For Each rw In tbl.Rows
        For Each cl In rw.Cells
            rng.Value = cl.outerText
            Set rng = rng.Offset(, 1)
            I = I + 1
        Next cl
    nextrow = nextrow + 1
    Set rng = rng.Offset(1, -I)
    I = 0
    Next rw
Next tbl

I = Range("B" & Rows.Count).End(xlUp).Row 'last row with data

Do While Cells(I, 1).Value = "" 'will loop until first not blank found in column A (starting from last row of data, from end to start)
    For Each ThisLink In doc.getElementsByTagName("a") 'we check all <a> tags
        If ThisLink.innerText = Cells(I, 2).Value Then Cells(I, 1).Value = ThisLink.href 'If the innertext is the name of the race, in column A we add link
    Next ThisLink
    I = I - 1 'we decrease row position
Loop
End Sub

but the case is that it returns the empty table as in this link : https://imageshack.us/i/poC4yhEZp


回答1:


This code, after you get all your data, will check every race from end of list to start of list. And it will add in column A the related link to race.

Sub GetAllTables(doc As Object)

Dim ws As Worksheet
Dim rng As Range
Dim tbl As Object
Dim rw As Object
Dim cl As Object
Dim tabno As Long
Dim nextrow As Long
Dim I As Long
Dim ThisLink As Object 'variable for <a> tags
Set ws = Worksheets.Add

For Each tbl In doc.getElementsByTagName("TABLE")
    tabno = tabno + 1
    nextrow = nextrow + 1
    Set rng = ws.Range("B" & nextrow)

    rng.Offset(, -1) = "Table " & tabno
    For Each rw In tbl.Rows
        For Each cl In rw.Cells
            rng.Value = cl.outerText
            Set rng = rng.Offset(, 1)
            I = I + 1
        Next cl
    nextrow = nextrow + 1
    Set rng = rng.Offset(1, -I)
    I = 0
    Next rw
Next tbl

I = Range("B" & Rows.Count).End(xlUp).Row 'last row with data

Do While Cells(I, 1).Value = "" 'will loop until first not blank found in column A (starting from last row of data, from end to start)
    For Each ThisLink In doc.getElementsByTagName("a") 'we check all <a> tags
        If ThisLink.innerText = Cells(I, 2).Value Then Cells(I, 1).Value = ThisLink.href 'If the innertext is the name of the race, in column A we add link
    Next ThisLink
    I = I - 1 'we decrease row position
Loop
End Sub

In HTML all <a> tags are like this:

<a href="d?r=4269456&z=F0K9jn">Henlow 26 Feb 2018 HT 5</a>

The href attribute contains the link related to text between <a> and </a>. You can get it with a.href in VBA

To know the text between <a> and </a> you can use a.InnerText

What i did is just a simple loop to check every <a> tag. If the InnerText matchs the value in the cell, then I get the href attribute.

This code will get you all the links you want in your question. Just adapt the code to your needs (I pasted them in column A, but maybe you want to do something else with them).

You need 2 references for this code to work;

  1. Microsoft HTML Object Library
  2. Microsoft Internet Controls

And this is the final result:



来源:https://stackoverflow.com/questions/49028178/greyhound-data-import-to-excel-macro-formula

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!