问题
I am trying to make a list of products based on the unique product codes of amazon for example:
https://www.amazon.in/gp/product/B00F2GPN36
Where B00F2GPN36 is the unique code.
I want to fetch the image and the title of the product into an Excel list under the columns product image and product name.
I have tried many times using the html.getElementsById("productTitle")
and html.getElementsByTagName
but there is always a problem while running the code and I am unable to solve it, so please help.
I also have doubt on what kind of variable to describe for storing the above mentioned info as I have tried declaration of Object
type and HtmlHtmlElement
.
I have tried to pull the html doc and use it for the data search.
Code:
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub parsehtml()
Dim ie As InternetExplorer
Dim topics As Object
Dim html As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "https://www.amazon.in/gp/product/B00F2GPN36"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Amazon.in...."
DoEvents
Loop
Application.StatusBar = ""
Set html = ie.document
Set topics = html.getElementsById("productTitle")
Sheets(1).Cells(1, 1).Value = topics.innerText
Set ie = Nothing
End Sub
I expect the output to be that in cell A1: "Milton Thermosteel Carafe Flask, 2 litres, Silver" should reflect (without quotation marks. and similarly want to pull the image as well.
But there is always some error like: 1. Run-time error '13': Type mismatch when I used "Dim topics As HTMLHtmlElement" 2. Run-time error '438': Object doesn't support this property or method
Note : I have already added the relevant references from Tools > References i.e. the required libraries.
回答1:
Faster would be to use xhr and avoid browser and write out results from an array to sheet
Option Explicit
Public Sub GetInfo()
Dim html As HTMLDocument, results()
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.amazon.in/gp/product/B00F2GPN36", False
.send
html.body.innerHTML = .responseText
With html
results = Array(.querySelector("#productTitle").innerText, .querySelector("#landingImage").getAttribute("data-old-hires"))
End With
End With
With ThisWorkbook.Worksheets("Sheet1")
.Cells(1, 1) = results(0)
Dim file As String
file = DownloadFile("C:\Users\User\Desktop\", results(1)) 'your path to download file
With .Pictures.Insert(file)
.Left = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Left
.Top = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Top
.Width = 75
.Height = 100
.Placement = 1
End With
End With
Kill file
End Sub
回答2:
There is no such thing as html.getElementsById("productTitle")
in vba. ID's are always unique, so it should be html.getElementById("productTitle")
. Run the following script to get them:
Sub ParseHtml()
Dim IE As New InternetExplorer, elem As Object
Dim Html As HTMLDocument, imgs As Object
With IE
.Visible = False
.navigate "https://www.amazon.in/gp/product/B00F2GPN36"
While .Busy Or .readyState < 4: DoEvents: Wend
Set Html = .document
End With
Set elem = Html.getElementById("productTitle")
Set imgs = Html.getElementById("landingImage")
Sheets(1).Cells(1, 1) = elem.innerText
Sheets(1).Cells(1, 1).Offset(0, 1) = imgs.getAttribute("data-old-hires")
End Sub
来源:https://stackoverflow.com/questions/56415063/how-to-pull-the-image-and-title-of-the-product-from-site-of-amazon