screen-scraping

Scrapy run multiple spiders from a script

雨燕双飞 提交于 2021-01-29 15:53:31
问题 Hey following question: I'm having a script I want Scrapy spiders to start from. For that I used a solution from another stack overflow post to integrate the settings so I don't have to overwrite them manually. So until now I'm able to start two crawlers from outside the Scrapy project: from scrapy_bots.update_Database.update_Database.spiders.m import M from scrapy_bots.update_Database.update_Database.spiders.p import P from scrapy.crawler import CrawlerProcess from scrapy.utils.project

Unable to locate an element using Python Selenium library

帅比萌擦擦* 提交于 2021-01-29 13:31:08
问题 I'm not able to find an element again. I've been learning... I've already looked for the solutions seeing past questions but the answer depends on the specify code. The name of the button is "Gestione" and when you click it, you should be able to see a drop-down menù. SELECTORS BY SELENIUM IDE id=ext-gen76 css=#ext-gen76 xpath=//em[@id='ext-gen76'] xpath=//tr[@id='ext-gen43']/td[2]/em xpath=//td[5]/table/tbody/tr/td[2]/em xpath=//em[contains(.,'Gestione')] HTML CODE HTML page 1 HTML page 2

Web Scraping Videos

只谈情不闲聊 提交于 2021-01-29 02:31:45
问题 I'm attempting to do a proof of concept by downloading a TV episode of Bob's Burgers at https://www.watchcartoononline.com/bobs-burgers-season-9-episode-3-tweentrepreneurs. I cannot figure out how to extract the video url from this website. I used Chrome and Firefox web developer tools to figure out it is in an iframe, but extracting src urls with BeautifulSoup searching for iframes, returns links that have nothing to do with the video. Where are the references to mp4 or flv files (which I

How to web scrape a chart by using Python?

爱⌒轻易说出口 提交于 2021-01-28 13:42:48
问题 I am trying to web scrape, by using Python 3, a chart off of this website into a .csv file: 2016 NBA National TV Schedule The chart starts out like: Tuesday, October 25 8:00 PM Knicks/Cavaliers TNT 10:30 PM Spurs/Warriors TNT Wednesday, October 26 8:00 PM Thunder/Sixers ESPN 10:30 PM Rockets/Lakers ESPN I am using these packages: from bs4 import BeautifulSoup import requests import pandas as pd import numpy as np The output I want in a .csv file looks like this: These are the first six lines

HTML Agility Pack Screen Scraping XPATH isn't returning data

懵懂的女人 提交于 2021-01-28 11:36:02
问题 I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing. The page that I'm scraping currently is http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=296-12602-1-ND The code I'm currently using is pretty

How to scrape links from Wikipedia with Python

假如想象 提交于 2021-01-28 07:23:39
问题 I am trying to scrape all the Links to battles from the "List of Naval Battles" on Wikipedia using python. The trouble is that I cannot figure out how to export all of the links containing the words "/wiki/Battle" to my CSV file. I am used to C++, so python is kind of foreign to me. Any ideas? Here is what I have so far... from bs4 import BeautifulSoup import urllib2 rootUrl = "https://en.wikipedia.org/wiki/List_of_naval_battles" def get_soup(url,header): return BeautifulSoup( urllib2.urlopen

Getting base64 string on scraping image src

被刻印的时光 ゝ 提交于 2021-01-27 06:35:56
问题 I am scraping image src, title, price etc from website but it gives base64 string in place of image src. When i'm appending all these scraped data to uri, it shows error long uri. How to slow this problem? 回答1: If you're getting a base64 string as the img src, it sounds as though the image is encoded inline. data: URIs are a very useful way to embed small items of data into a URL—rather than link to an external resource, the URL contains the actual encoded data. An HTML fragment embedding a

scraping table with python based on dates

旧城冷巷雨未停 提交于 2020-12-27 05:58:46
问题 since a week ago i have been trying to scrape a table from this site https://www.bi.go.id/id/moneter/informasi-kurs/transaksi-bi/Default.aspx but i dont have an idea what to write,i am very confused. iam trying to scrape table of kurs transaction from 2015-2020(20 nov 2015-20 nov 2020, but the problem is the link between the default date and the date that I chose is still the same.please help me in any way,Thank you before ! import requests from bs4 import BeautifulSoup import pandas as pd