Webscrape loop on all URLs in Column A

徘徊边缘 提交于 2021-01-28 05:39:21

问题


I'm trying to scrape the Facebook Video Titles from a list of URL's.

I've got my macro working for a single video in which the URL is built into the code. I'd like the script to instead loop through each URL in Column A and output the Video Title into Column B. Any help?

Current code:

Sub ScrapeVideoTitle()    
    Dim appIE As Object
    Set appIE = CreateObject("internetexplorer.application")

    With appIE
        .navigate "https://www.facebook.com/rankertotalnerd/videos/276505496352731/"
        .Visible = True

        Do While appIE.Busy        
            DoEvents
        Loop

        'Add Video Title to Column B
        Range("B2").Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

        appIE.Quit
        Set appIE = Nothing
    End With
End Sub

回答1:


Provided you can go VBE > Tools > References > Add a reference to Microsoft HTML Object Library you can do the following:

Read all the urls into an array. Loop the array and use xmlhttp to issue GET request to page. Read the response into an HTMLDocument variable and use css selector to extract the title and store in an array. At the end of the loop write all results out to sheet in one go.

Option Explicit
Public Sub GetTitles()
    Dim urls(), ws As Worksheet, lastRow As Long, results(), i As Long, html As HTMLDocument

    Set html = New HTMLDocument
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    With ws
        lastRow = .Cells(.rows.Count, "A").End(xlUp).Row
        urls = Application.Transpose(.Range("A2:A" & lastRow).Value)
    End With
    ReDim results(1 To UBound(urls))
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(urls) To UBound(urls)
            If InStr(urls(i), "http") > 0 Then
                .Open "GET", urls(i), False
                .send
                html.body.innerHTML = .responseText
                results(i) = html.querySelector(".uiHeaderTitle span").innerText
            End If
        Next
    End With
    ws.Cells(2, 2).Resize(UBound(results), 1) = Application.Transpose(results)
End Sub

Matching of css selector to page:




回答2:


If you had the "276505496352731" part of the url, or indeed the whole URL in olumn A you could set a range to the top value, and then loop until the range was empty, moving it down once for each scrape.

Something like:

'Dims as before
Dim r as range

With appIE

  set r = Range("B1")  ' Assumes B1 is the top of the URL list
  do while r.value > ""

    .navigate r.value
    'Do the rest of your IE stuff
    r.offset(0,1).Value = appIE.document.getElementsByClassName("_4ik6")(0).innerText

    set r = r.offset(1)
  Loop
End With

That should help hopefully.



来源:https://stackoverflow.com/questions/56962497/webscrape-loop-on-all-urls-in-column-a

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!