How to pull the image and title of the product from site of amazon?

杀马特。学长 韩版系。学妹 提交于 2019-12-10 17:55:12

问题


I am trying to make a list of products based on the unique product codes of amazon for example:

https://www.amazon.in/gp/product/B00F2GPN36

Where B00F2GPN36 is the unique code.

I want to fetch the image and the title of the product into an Excel list under the columns product image and product name.

I have tried many times using the html.getElementsById("productTitle") and html.getElementsByTagName but there is always a problem while running the code and I am unable to solve it, so please help.

I also have doubt on what kind of variable to describe for storing the above mentioned info as I have tried declaration of Object type and HtmlHtmlElement.

I have tried to pull the html doc and use it for the data search.

Code:

Enum READYSTATE
     READYSTATE_UNINITIALIZED = 0
     READYSTATE_LOADING = 1
     READYSTATE_LOADED = 2
     READYSTATE_INTERACTIVE = 3
     READYSTATE_COMPLETE = 4
End Enum

Sub parsehtml()

     Dim ie As InternetExplorer
     Dim topics As Object
     Dim html As HTMLDocument

     Set ie = New InternetExplorer
     ie.Visible = False
     ie.navigate "https://www.amazon.in/gp/product/B00F2GPN36"

     Do While ie.READYSTATE <> READYSTATE_COMPLETE
       Application.StatusBar = "Trying to go to Amazon.in...."
       DoEvents    
     Loop

     Application.StatusBar = ""
     Set html = ie.document
     Set topics = html.getElementsById("productTitle")
     Sheets(1).Cells(1, 1).Value = topics.innerText
     Set ie = Nothing

End Sub

I expect the output to be that in cell A1: "Milton Thermosteel Carafe Flask, 2 litres, Silver" should reflect (without quotation marks. and similarly want to pull the image as well.

But there is always some error like: 1. Run-time error '13': Type mismatch when I used "Dim topics As HTMLHtmlElement" 2. Run-time error '438': Object doesn't support this property or method

Note : I have already added the relevant references from Tools > References i.e. the required libraries.


回答1:


Faster would be to use xhr and avoid browser and write out results from an array to sheet

Option Explicit
Public Sub GetInfo()
    Dim html As HTMLDocument, results()
    Set html = New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.amazon.in/gp/product/B00F2GPN36", False
        .send
        html.body.innerHTML = .responseText
        With html
            results = Array(.querySelector("#productTitle").innerText, .querySelector("#landingImage").getAttribute("data-old-hires"))
        End With
    End With
    With ThisWorkbook.Worksheets("Sheet1")
        .Cells(1, 1) = results(0)
        Dim file As String
        file = DownloadFile("C:\Users\User\Desktop\", results(1))  'your path to download file
        With .Pictures.Insert(file)
            .Left = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Left
            .Top = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Top
            .Width = 75
            .Height = 100
            .Placement = 1
        End With
    End With
    Kill file
End Sub 



回答2:


There is no such thing as html.getElementsById("productTitle") in vba. ID's are always unique, so it should be html.getElementById("productTitle"). Run the following script to get them:

Sub ParseHtml()
    Dim IE As New InternetExplorer, elem As Object
    Dim Html As HTMLDocument, imgs As Object

    With IE
        .Visible = False
        .navigate "https://www.amazon.in/gp/product/B00F2GPN36"
        While .Busy Or .readyState < 4: DoEvents: Wend
        Set Html = .document
    End With

    Set elem = Html.getElementById("productTitle")
    Set imgs = Html.getElementById("landingImage")

    Sheets(1).Cells(1, 1) = elem.innerText
    Sheets(1).Cells(1, 1).Offset(0, 1) = imgs.getAttribute("data-old-hires")
End Sub


来源:https://stackoverflow.com/questions/56415063/how-to-pull-the-image-and-title-of-the-product-from-site-of-amazon

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!