问题
I am trying to access data from the graph from the below mentioned website https://www.prisjakt.nu/produkt.php?pu=5183925
I am able to access and extract data from the table below the graph. But i am unable to fetch data from the graph which is being called dynamically using a javascript? I knew that using beautifulsoup api is not sufficient here. I tried going around in console of the webpage to see the contents of the graph but i am not successful.
I also tried to look into view-source:https://www.prisjakt.nu/produkt.php?pu=5183925 how this is being called.
<div class="graph" data-testid="graph" data-test="PriceHistoryGraph">
I am trying to print the history of the prices of an item from the website. For example something similar to a below snippet which is in the json format i found from "view source".
"nodes":[{"date":"2019-09-10","lowestPrice":13195},{"date":"2019-09-11","lowestPrice":12990},{"date":"2019-09-12","lowestPrice":12990},
I am suspecting that the above data can be found at
<rect class = "vx-bar" ...... where data="[Object Object][Object Object][Object Object]..."
is a list of arrays with two elements in each array. Something similar to to above snippet "nodes". Isn't it?
A simple piece of code i am using at the moment for a biref idea which will print entire layout including the graph and table below.
my_url = 'https://www.prisjakt.nu/produkt.php?pu=5183925'
driver.get(my_url)
sleep(3)
page = requests.get(my_url, headers=headers)
soup = soup(page.content, 'html.parser')
data = soup.findAll(id="statistics")
print(data)
Any suggestions with an example or a solution would help me. Thanks in Advance!
回答1:
You're right, the graph is being constructed dynamically, but you can easily grab that data.
Here's how:
import requests
response = requests.get('https://www.prisjakt.nu/_internal/graphql?release=2020-11-20T07:33:45Z|db08e4bc&version=6f2bf5&main=product&variables={"id":5183925,"offset":0,"section":"statistics","statisticsTime":"1970-01-02","marketCode":"se","personalizationExcludeCategories":[],"userActions":true,"badges":true,"media":true,"campaign":true,"relatedProducts":true,"campaignDeals":true,"priceHistory":true,"recommendations":true,"campaignId":2,"personalizationClientId":"","pulseEnvironmentId":"sdrn:schibsted:environment:undefined"}').json()
for node in response["data"]["product"]["statistics"]["nodes"]:
print(f"{node['date']} - {node['lowestPrice']}")
Output:
2019-09-10 - 13195
2019-09-11 - 12990
2019-09-12 - 12990
2019-09-13 - 12605
2019-09-14 - 12605
2019-09-15 - 12605
2019-09-16 - 12970
2019-09-17 - 12970
2019-09-18 - 12970
2019-09-19 - 12969
2019-09-20 - 12969
2019-09-21 - 12969
2019-09-22 - 12969
2019-09-23 - 9195
2019-09-24 - 12970
and so on...
来源:https://stackoverflow.com/questions/64926393/webscraping-data-from-an-interactive-graph-from-a-website