web-scraping | 易学教程

Selenium's find_elements with two clauses for text?

阅读更多关于 Selenium's find_elements with two clauses for text?

问题 How can I use selenium's find_elements_by_xpath() based on text, when it may or may not have a word? Example: it can be either #1 here or #1 is here . I want both to be part of the same list, since that func return a list. ATM I have driver.find_elements_by_xpath("//*[contains(text(), '#1 here')]") but that would only find the first case, not the ones with an is . Basically, something like driver.find_elements_by_xpath("//*[contains(text(), '#1 here' or '#1 is here')]") How could I do that? I

Scraping table with BeautifulSoup

阅读更多关于 Scraping table with BeautifulSoup

问题 In this first code, I can use BS to get all the info within the table of interest: from urllib import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") soup = BeautifulSoup(html) for i in soup.find("table",{"id":"giftList"}).children: print child That prints the product lists. I want to print the rows in the tournamentTable here (desired info is in class=deactivate , class=odd deactivate and date in class=center nob-border ): from urllib

How to scrape a table and its links

阅读更多关于 How to scrape a table and its links

问题 What I want to do is to take thw following website https://www.tdcj.texas.gov/death_row/dr_executed_offenders.html view-source:https://www.tdcj.texas.gov/death_row/dr_executed_offenders.html And pick the year of execution, enter the Last Statement Link, and retrieve the statement... perhaps I would be creating 2 dictionaries, both with the execution number as key. Afterwards, I would classify the statements by length, besides " flagging " the refusals to give it or if it was just not given.

How to fix '$(…).click is not a function' in Node/Cheerio

阅读更多关于 How to fix '$(…).click is not a function' in Node/Cheerio

问题 I am writing an application in node.js that will navigate to a website, click a button on the website, and then extract certain pieces of data from the website. All is going well except for the button-clicking aspect. I cannot seem to simulate a button click. I'm extremely new at this, so I'd appreciate any suggestions y'all have! Sadly I've scoured the internet looking for a solution to this issue and have been unable to find one. I have used .click() and .bind('click, ...) in a .js file

HTML Agility Pack Screen Scraping XPATH isn't returning data

阅读更多关于 HTML Agility Pack Screen Scraping XPATH isn't returning data

问题 I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing. The page that I'm scraping currently is http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=296-12602-1-ND The code I'm currently using is pretty

Webscraping data from an interactive graph from a website

阅读更多关于 Webscraping data from an interactive graph from a website

问题 I am trying to access data from the graph from the below mentioned website https://www.prisjakt.nu/produkt.php?pu=5183925 I am able to access and extract data from the table below the graph. But i am unable to fetch data from the graph which is being called dynamically using a javascript? I knew that using beautifulsoup api is not sufficient here. I tried going around in console of the webpage to see the contents of the graph but i am not successful. I also tried to look into view-source

Storing information from td tags with a specific width, in python

阅读更多关于 Storing information from td tags with a specific width, in python

问题 I am trying to store all the information from the td tags that have width="82" or maybe there is a more efficient method. <a name="AAKER"> </a> <table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b> <small>(<a href="http://google.com">Soundex A260</a>) — <i>See also</i> <a href="http://google.com">ACKER</a>, <a href="http://google.com">KEAR</a>, <a href="http://google.com">TAAKE</a>. </small> </td></tr></tbody></table><br clear="all"> <table align="left"

Storing information from td tags with a specific width, in python

阅读更多关于 Storing information from td tags with a specific width, in python

Scraping a React-table using Selenium

阅读更多关于 Scraping a React-table using Selenium

问题 I have written a code for scaping an HTML React table by using python selenium. But it cannot catch values in the table(only DOM elements). Here is the website https://nonfungible.com/market/history/decentraland?filter=saleType%3D&length=10&sort=blockTimestamp%3Ddesc&start=0 Here is my code: from selenium import webdriver dr = webdriver.PhantomJS(r'PATH_TO_PHANTOM/phantomjs-2.1.1-macosx/bin/phantomjs') dr.get("https://nonfungible.com/market/history/decentraland?filter=saleType%3D&length=10

How to use Selenium to click a button in a popup modal box

阅读更多关于 How to use Selenium to click a button in a popup modal box

问题 I am trying to use Selenium in Python to pull some data from https://www.seekingalpha.com. The front page has a "Sign-in/Join now" link. I used Selenium to click it, which brought up a popup asking for sign-in information with another "Sign in" button. It seems my code below can enter my username and password, but my attempt to click the "sign in" button didn't get the right response (it clicked on the ad below the popup box.) I am using Python 3.5. Here is my code: driver = webdriver.Chrome(