Web-scraping across multipages without even knowing the last page number

試著忘記壹切 提交于 2019-12-30 11:53:15

问题


Running my code for a site to crawl the titles of different tutorials spreading across several pages, I found it working flawless. I tried to write some code not depending on the last page number the url has but on the status code until it shows http.status<>200. The code I'm pasting below is working impeccably in this case. However, Trouble comes up when I try to use another url to see whether it breaks automatically but found that the code did fetch all the results but did not break. What is the workaround in this case so that the code will break when it is done and stop the macro? Here is the working one?

Sub WiseOwl()
Const mlink = "http://www.wiseowl.co.uk/videos/default"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object

Do While True
     y = y + 1
    With http
        .Open "GET", mlink & "-" & y & ".htm", False
        .send
        If .Status <> 200 Then
            MsgBox "It's done"
            Exit Sub
        End If
        html.body.innerHTML = .responseText
    End With

    For Each post In html.getElementsByClassName("woVideoListDefaultSeriesTitle")
        With post.getElementsByTagName("a")
            x = x + 1
            If .Length Then Cells(x, 1) = .item(0).innerText
        End With
    Next post
Loop
End Sub

I found a logic to get around with yellowpage. My update script is able to parse yellowpage but breaks before scraping the last page because there is no "Next Page" button. I tried with this: "https://www.dropbox.com/s/iptqm79b0byw3dz/Yellowpage.txt?dl=0"

However, the same logic I tried to apply with torrent site but it doesn't work here:

"https://www.yify-torrent.org/genres/western/p-1/"


回答1:


You can always rely on elements if they exits or not. Here for example, if you try to use the object which you have set your element to, you will get:

Run-time error '91': Object variable or With block variable not set

This is the key you should be looking for to put an end to your code. Please see the below example:

Sub yify()
Const mlink = "https://www.yify-torrent.org/genres/western/p-"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Dim posts As Object

y = 1
Do
    With http
        .Open "GET", mlink & y & "/", False
        .send
        html.body.innerHTML = .responseText
    End With

    Set posts = html.getElementsByClassName("mv")
    On Error GoTo Endofpage
    Debug.Print Len(posts) 'to force Error 91

    For Each post In posts
        With post.getElementsByTagName("div")
            x = x + 1
            If .Length Then Cells(x, 1) = .Item(0).innerText
        End With
    Next post
    y = y + 1
Endofpage:
Loop Until Err.Number = 91
Debug.Print "It's over"
End Sub


来源:https://stackoverflow.com/questions/45200247/web-scraping-across-multipages-without-even-knowing-the-last-page-number

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!