web-scraping

Selenium's find_elements with two clauses for text?

本小妞迷上赌 提交于 2021-01-28 12:14:04
问题 How can I use selenium's find_elements_by_xpath() based on text, when it may or may not have a word? Example: it can be either #1 here or #1 is here . I want both to be part of the same list, since that func return a list. ATM I have driver.find_elements_by_xpath("//*[contains(text(), '#1 here')]") but that would only find the first case, not the ones with an is . Basically, something like driver.find_elements_by_xpath("//*[contains(text(), '#1 here' or '#1 is here')]") How could I do that? I

Scraping table with BeautifulSoup

岁酱吖の 提交于 2021-01-28 12:01:03
问题 In this first code, I can use BS to get all the info within the table of interest: from urllib import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") soup = BeautifulSoup(html) for i in soup.find("table",{"id":"giftList"}).children: print child That prints the product lists. I want to print the rows in the tournamentTable here (desired info is in class=deactivate , class=odd deactivate and date in class=center nob-border ): from urllib

How to scrape a table and its links

蓝咒 提交于 2021-01-28 12:00:32
问题 What I want to do is to take thw following website https://www.tdcj.texas.gov/death_row/dr_executed_offenders.html view-source:https://www.tdcj.texas.gov/death_row/dr_executed_offenders.html And pick the year of execution, enter the Last Statement Link, and retrieve the statement... perhaps I would be creating 2 dictionaries, both with the execution number as key. Afterwards, I would classify the statements by length, besides " flagging " the refusals to give it or if it was just not given.

How to fix '$(…).click is not a function' in Node/Cheerio

人盡茶涼 提交于 2021-01-28 11:40:34
问题 I am writing an application in node.js that will navigate to a website, click a button on the website, and then extract certain pieces of data from the website. All is going well except for the button-clicking aspect. I cannot seem to simulate a button click. I'm extremely new at this, so I'd appreciate any suggestions y'all have! Sadly I've scoured the internet looking for a solution to this issue and have been unable to find one. I have used .click() and .bind('click, ...) in a .js file

HTML Agility Pack Screen Scraping XPATH isn't returning data

懵懂的女人 提交于 2021-01-28 11:36:02
问题 I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing. The page that I'm scraping currently is http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=296-12602-1-ND The code I'm currently using is pretty

Webscraping data from an interactive graph from a website

柔情痞子 提交于 2021-01-28 11:17:22
问题 I am trying to access data from the graph from the below mentioned website https://www.prisjakt.nu/produkt.php?pu=5183925 I am able to access and extract data from the table below the graph. But i am unable to fetch data from the graph which is being called dynamically using a javascript? I knew that using beautifulsoup api is not sufficient here. I tried going around in console of the webpage to see the contents of the graph but i am not successful. I also tried to look into view-source

Storing information from td tags with a specific width, in python

喜你入骨 提交于 2021-01-28 10:25:45
问题 I am trying to store all the information from the td tags that have width="82" or maybe there is a more efficient method. <a name="AAKER"> </a> <table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b> <small>(<a href="http://google.com">Soundex A260</a>) — <i>See also</i> <a href="http://google.com">ACKER</a>, <a href="http://google.com">KEAR</a>, <a href="http://google.com">TAAKE</a>. </small> </td></tr></tbody></table><br clear="all"> <table align="left"

Storing information from td tags with a specific width, in python

时光总嘲笑我的痴心妄想 提交于 2021-01-28 10:24:15
问题 I am trying to store all the information from the td tags that have width="82" or maybe there is a more efficient method. <a name="AAKER"> </a> <table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b> <small>(<a href="http://google.com">Soundex A260</a>) — <i>See also</i> <a href="http://google.com">ACKER</a>, <a href="http://google.com">KEAR</a>, <a href="http://google.com">TAAKE</a>. </small> </td></tr></tbody></table><br clear="all"> <table align="left"

Scraping a React-table using Selenium

心已入冬 提交于 2021-01-28 09:01:03
问题 I have written a code for scaping an HTML React table by using python selenium. But it cannot catch values in the table(only DOM elements). Here is the website https://nonfungible.com/market/history/decentraland?filter=saleType%3D&length=10&sort=blockTimestamp%3Ddesc&start=0 Here is my code: from selenium import webdriver dr = webdriver.PhantomJS(r'PATH_TO_PHANTOM/phantomjs-2.1.1-macosx/bin/phantomjs') dr.get("https://nonfungible.com/market/history/decentraland?filter=saleType%3D&length=10

How to use Selenium to click a button in a popup modal box

限于喜欢 提交于 2021-01-28 08:51:00
问题 I am trying to use Selenium in Python to pull some data from https://www.seekingalpha.com. The front page has a "Sign-in/Join now" link. I used Selenium to click it, which brought up a popup asking for sign-in information with another "Sign in" button. It seems my code below can enter my username and password, but my attempt to click the "sign in" button didn't get the right response (it clicked on the ad below the popup box.) I am using Python 3.5. Here is my code: driver = webdriver.Chrome(