web-scraping | 易学教程

Scrapy does not find text in Xpath or Css

阅读更多关于 Scrapy does not find text in Xpath or Css

问题 I've been at this one for a few days, and no matter how I try, I cannot get scrapy to abstract text that is in one element. to spare you all the code, here are the important pieces. The setup does grab everything else off the page, just not this text. from scrapy.selector import Selector start_url = "https://www.tripadvisor.com/VacationRentalReview-g34416-d12428323-On_the_Beach_Wide_flat_beach_Sunsets_Gulf_view_Sharks_teeth_Shells_Fish-Manasota_Key_F.html" #BASIC ITEM AND SPIDER YADA, SPARE

How can I check if either xpath exists and then return the value if text is present?

阅读更多关于 How can I check if either xpath exists and then return the value if text is present?

问题 I'm having trouble with the second r.html.xpath request. When there is a special deal on an item, the second Xpath changes from //*[@id="priceblock_ourprice"] to //*[@id="priceblock_dealprice"] This causes the script to fail since there the right xpath cannot be returned. How can I include this second xpath that only shows up occasionally? I would like to see if either xpath exists, if so return that, or return N/A. The first url that is searched has the ourprice xpath and the second url has

How can I check if either xpath exists and then return the value if text is present?

阅读更多关于 How can I check if either xpath exists and then return the value if text is present?

trying to close popover - python - selenium - Glassdoor

阅读更多关于 trying to close popover - python - selenium - Glassdoor

问题 Trying to close a popover while scraping Glassdoor for jobs [It keeps popping up from time to time - need to close it every time].. I've tried quite a few things Tried closing it by looking for the close button. Please help ! driver.find_element_by_class_name("SVG_Inline modal_closeIcon").click() Tried looking for a ElementClickInterceptedException when the bot couldn't click on the next company, and everywhere else there was a click element = WebDriverWait(driver, 3).until(EC.presence_of

How to scraping iframe using selenium?

阅读更多关于 How to scraping iframe using selenium?

问题 I want to extract all comment in a website. The website using iframe for the comment section. I already tried to scrap it using selenium. but unfortunaly, i just can scrap 1 comment. How to scrap the rest of the comment and archive it to csv or xmls? Code : from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Chrome() page = driver

How to scraping iframe using selenium?

阅读更多关于 How to scraping iframe using selenium?

PHP: Simple HTML DOM Parser - how to get the element which has certain content?

阅读更多关于 PHP: Simple HTML DOM Parser - how to get the element which has certain content?

问题 In PHP I'm using the Simple HTML DOM Parser class. I have a HTML file which has multiple A-tags. Now I need to find the tag that has a certain text inside. for example : $html = "<a id='tag1'>A</a> <a id='tag2'>B</a> <a id='tag3'>C</a> "; $dom = str_get_html($html); $tag = $dom->find("a[plaintext=B]"); The above example doesn't work, since plaintext can only be used as an attribute. Any idea's? 回答1: <?php include("simple_html_dom.php"); $html = "<a id='tag1'>A</a> <a id='tag2'>B</a> <a id=

PHP: Simple HTML DOM Parser - how to get the element which has certain content?

阅读更多关于 PHP: Simple HTML DOM Parser - how to get the element which has certain content?

Python requests 401 error but url opens in browser

阅读更多关于 Python requests 401 error but url opens in browser

问题 I am trying to pull the json from this location - https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY This opens fine in my browser, but using requests in python throws a 401 permission error. I have tried adding headers with different arguments, but to no avail. Interestingly, the json on this page does not open in the browser as well until https://www.nseindia.com is opened separately. I believe it requires some kind of authentication, but surprised it works in the browser

Python requests 401 error but url opens in browser

阅读更多关于 Python requests 401 error but url opens in browser