web-scraping

Not able to scrap the images from Flipkart.com website the src attribute is coming emtpy

好久不见. 提交于 2021-02-08 11:18:21
问题 I am able to scrap all the data from flipkart website except the images using the code below: jobs = soup.find_all('div',{"class":"IIdQZO _1R0K0g _1SSAGr"}) for job in jobs: product_name = job.find('a',{'class':'_2mylT6'}) product_name = product_name.text if product_name else "N/A" product_offer_price = job.find('div',{'class':'_1vC4OE'}) product_offer_price = product_offer_price.text if product_offer_price else "N/A" product_mrp = job.find('div',{'class':'_3auQ3N'}) product_mrp = product_mrp

Web scraping with R - no HTML visible

大兔子大兔子 提交于 2021-02-08 10:36:22
问题 I am trying to use R scrape a website: http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/GO/90000609234 It has several fields with lots of information. I am only interested in the url above the field "site do candidato". In this example, the url I want is: "http://vanderlansenador111.com.br" The problem is, there is no HTML (visible). So, I don't think using rvest is helpful (at least, I don't know how to use it). Is there a way to scrape it without using selenium (I

Web scraping with R - no HTML visible

自古美人都是妖i 提交于 2021-02-08 10:32:21
问题 I am trying to use R scrape a website: http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/GO/90000609234 It has several fields with lots of information. I am only interested in the url above the field "site do candidato". In this example, the url I want is: "http://vanderlansenador111.com.br" The problem is, there is no HTML (visible). So, I don't think using rvest is helpful (at least, I don't know how to use it). Is there a way to scrape it without using selenium (I

How to scrape many dynamic urls in Python

狂风中的少年 提交于 2021-02-08 10:30:39
问题 I want to scrape one dynamic url at a time. What I did is that I scrape the URL from that I get from all the href s and then I want to scrape that URL. What I am trying: from bs4 import BeautifulSoup import urllib.request import re r = urllib.request.urlopen('http://i.cantonfair.org.cn/en/ExpExhibitorList.aspx?k=glassware') soup = BeautifulSoup(r, "html.parser") links = soup.find_all("a", href=re.compile(r"expexhibitorlist\.aspx\?categoryno=[0-9]+")) linksfromcategories = ([link["href"] for

How to scrape many dynamic urls in Python

孤者浪人 提交于 2021-02-08 10:29:24
问题 I want to scrape one dynamic url at a time. What I did is that I scrape the URL from that I get from all the href s and then I want to scrape that URL. What I am trying: from bs4 import BeautifulSoup import urllib.request import re r = urllib.request.urlopen('http://i.cantonfair.org.cn/en/ExpExhibitorList.aspx?k=glassware') soup = BeautifulSoup(r, "html.parser") links = soup.find_all("a", href=re.compile(r"expexhibitorlist\.aspx\?categoryno=[0-9]+")) linksfromcategories = ([link["href"] for

can't create project in scrapy says dll load failed

前提是你 提交于 2021-02-08 10:17:38
问题 from cryptography.hazmat.bindings._openssl import ffi, lib ImportError: DLL load failed: The operating system cannot run %1. i installed scrapy through conda by conda install scrapy -c conda-forge 回答1: me too i meet this problem under windows 10 , after many search on many websites . i found this solution : download this : https://github.com/python/cpython-bin-deps/tree/openssl-bin-1.0.2k zip the file and copy the folder (amd or win ) in your sys path : C:\Windows\SysWOW64 and voila every

Selecting and Clicking Elements based on class name with Nightmare.js

妖精的绣舞 提交于 2021-02-08 10:16:33
问题 im trying to select an element that's an image withing a div and then click it using nightmare.js. Below is the element im trying to click and below that the code im using. <div class="custom-navigator-right"><img onload="this.__gwtLastUnhandledEvent="load";" src="http://iris.generali.gr/iris/webiris/clear.cache.gif" style="width:40px;height:43px;background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACgAAAArCAYAAAAKasrDAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A/wD

Cleaning Data Scraped from Web

橙三吉。 提交于 2021-02-08 10:13:59
问题 Slightly new to r and I've been working on a project (just for fun) to help me learn and I'm running into something that I can't seem to find answers for online. I am trying to teach myself to scrape websites for data, and I've started with the code below that retrieves some data from 247 sports. library(rvest) library(stringr) link <- "https://247sports.com/college/iowa-state/Season/2017-Football/Commits?sortby=rank" link.scrap <- read_html(link) data <- html_nodes(x = link.scrap, css = '

How to Get Script Tag Variables From a Website using Python

谁说胖子不能爱 提交于 2021-02-08 10:03:55
问题 I am trying to pull a variable called meta in a script tag using Python. I have used selenium to do this before, but selenium is too slow for what I am trying to accomplish. Is there any other way of doing this. I have tried using BeautifulSoup, but I'm stuck... code is below Here is the script tag I'm trying to get the meta variable from: <script>window.ShopifyAnalytics = window.ShopifyAnalytics || {}; window.ShopifyAnalytics.meta = window.ShopifyAnalytics.meta || {}; window.ShopifyAnalytics

Use Pandas to Get Multiple Tables From Webpage

柔情痞子 提交于 2021-02-08 09:57:32
问题 I am using Pandas to parse the data from the following page: http://kenpom.com/index.php?y=2014 To get the data, I am writing: dfs = pd.read_html(url) The data looks great and is perfectly parsed, except it only takes data from the 40 first rows. It seems to be a problem with the separation of the tables, that makes it so that pandas does no get all the information. How do you get pandas to get all the data from all the tables on that webpage? 回答1: The HTML of page you have posted have