Web scrape using XHR from siriusxm.com

五迷三道 提交于 2019-12-17 03:44:21

问题


I need to pull the currently playing artist and song from http://www.siriusxm.com/siriusxmhits1. I can get this to work navigating to the website with Internet Explorer but it takes too long so I have tried using WINHTTP.WinHTTPRequest.5.1 and MSXML2.serverXMLHTTP but neither pulls the specific data I'm looking for. I think I'm close but am missing something.

Below is the HTML snippet:

<div id="on-the-air-content" style="display: block;">
    <div class="module-content theme-color-content-bg clearfix">
        <div id="onair-pdt" style="display: block;">
            <img alt="" src="//www.siriusxm.com/albumart/Live/2000/chainsmokers_58C328AC_t.jpg">
            <p class="onair-pdt-artist">Chainsmokers/Coldplay</p>
            <p class="onair-pdt-song">Something Just Like This</p>
        </div>
        ...
    </div>
    ...
</div>

Here is my current code:

Sub GetData()

    Dim getArtist As Object
    Dim getSong As Object

    Set xmHtml = New HTMLDocument
    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://www.siriusxm.com/siriusxmhits1", False
        .send
        xmHtml.body.innerHTML = .responseText
    End With
    Set getArtist = xmHtml.getElementById("onair-pdt").getElementsByTagName("p")(0)
    MsgBox (getArtist.innerText)
    Set getSong = xmHtml.getElementById("onair-pdt").getElementsByTagName("p")(1)
    MsgBox (getSong.innerText)

End Sub

If I activate Internet Explorer it will work using the following code, but that takes too long for what I need to do:

Sub GetData()

    Dim DivID As HTMLObjectElement
    Dim getArtist As Variant
    Dim getSong As Variant

    URL = "http://www.siriusxm.com/siriusxmhits1"
    With IExplore
        .Navigate URL
        .Visible = False
        Do While .readyState <> 4: DoEvents: Loop
        Set doc = .document
        Set DivID = doc.getElementById("onair-pdt")
        getArtist = DivID.getElementsByClassName("onair-pdt-artist")(0).innerText
        getSong = doc.getElementsByClassName("onair-pdt-song")(0).innerText
    End With

End Sub

回答1:


The website http://www.siriusxm.com has a sort of an API available. I navigated a page by the link http://www.siriusxm.com/hits1 in Chrome, then opened Developer Tools window (F12), Network tab, and examined XHRs in the list. Current song info can be retrieved e. g. in the following steps:

  • Make XHR by URL http://www.siriusxm.com/sxm_date_feed.tzi to retrieve current timestamp.

  • Make XHR using current timestamp in last numbers of URL http://www.siriusxm.com/metadata/pdt/en-us/json/channels/siriushits1/timestamp/04-29-02:02:55

  • Parse received JSON response.

  • Get song name as JSON.channelMetadataResponse.metaData.currentEvent.song.name, artists as JSON.channelMetadataResponse.metaData.currentEvent.artists.name, etc.

Below is the sample showing JSON response structure, I use online tool http://jsonviewer.stack.hu:

You may use the below VBA code to retrieve info as described above. Import JSON.bas module into the VBA project for JSON processing.

Option Explicit

Sub Test_siriusxm_com()

    Dim s As String
    Dim d As Date
    Dim sUrl As String
    Dim vJSON As Variant
    Dim sState As String
    Dim sArtists As String
    Dim sComposer As String
    Dim sAlbum As String
    Dim sSong As String

    ' Retrieve timestamp
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://www.siriusxm.com/sxm_date_feed.tzi", False
        .send
        s = .responseText
    End With
    ' Parse timestamp to Date type
    d = CDate(DateSerial(Mid(s, 5, 4), Mid(s, 3, 2), Mid(s, 1, 2)) + TimeSerial(Mid(s, 9, 2), Mid(s, 11, 2), Mid(s, 13, 2)))
    ' Add 4 hours to get UTC from EDT timezone
    d = DateAdd("h", 4, d)
    ' Combine URL with timestamp
    sUrl = "http://www.siriusxm.com/metadata/pdt/en-us/json/channels/siriushits1/timestamp/" & _
            LZ(Month(d), 2) & "-" & _
            LZ(Day(d), 2) & "-" & _
            LZ(Hour(d), 2) & ":" & _
            LZ(Minute(d), 2) & ":" & _
            "00"
    ' Retrieve channelMetadataResponse JSON data
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", sUrl, False
        .send
        s = .responseText
    End With
    ' Parse JSON response
    JSON.Parse s, vJSON, sState
    ' Check if valid
    If sState <> "Object" Then
        MsgBox "Invalid JSON response"
        Exit Sub
    End If
    ' Check if available
    If vJSON("channelMetadataResponse")("messages")("code") <> "100" Then
        MsgBox "Unavailable content"
        Exit Sub
    End If
    ' Get necessary properties
    Set vJSON = vJSON("channelMetadataResponse")("metaData")("currentEvent")
    sArtists = vJSON("artists")("name")
    sComposer = vJSON("song")("composer")
    sAlbum = vJSON("song")("album")("name")
    sSong = vJSON("song")("name")
    ' Output results
    MsgBox "On the Air" & vbCrLf & _
        "Artists: " & sArtists & vbCrLf & _
        "Composer: " & sComposer & vbCrLf & _
        "Album: " & sAlbum & vbCrLf & _
        "Song: " & sSong

End Sub

Function LZ(n As String, q As Long) As String ' Add leading zeroes
    LZ = Right(String(q, "0") & n, q)
End Function

BTW, the same approach used in this, this and this answers.



来源:https://stackoverflow.com/questions/43311644/web-scrape-using-xhr-from-siriusxm-com

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!