web-scraping | 易学教程

rvest function html_nodes returns {xml_nodeset (0)}

阅读更多关于 rvest function html_nodes returns {xml_nodeset (0)}

问题 I am trying to scrape data frame the following website http://stats.nba.com/game/0041700404/playbyplay/ I'd like to create a table that includes the date of the game, the scores throughout the game, and the team names I am using the following code: game1 <- read_html("http://stats.nba.com/game/0041700404/playbyplay/") #Extracts the Date html_nodes(game1, xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "game-summary-team--vtm", " " ))]//*[contains(concat( " ", @class, " " ),

rvest function html_nodes returns {xml_nodeset (0)}

阅读更多关于 rvest function html_nodes returns {xml_nodeset (0)}

Scrape tables by passing multiple search requests using R

阅读更多关于 Scrape tables by passing multiple search requests using R

问题 I'm trying to search for multiple times on a website using First and last name (https://npiregistry.cms.hhs.gov/registry/) and then create a dataframe of the output I figured out that this is similar to what has been described in How to automate multiple requests to a web search form using R, but for some reasons I've been getting the error "Error: failed to load external entity"` Below is the code that I'm using to pull records fn = rep(c('HARVEY','HARVEY')); ln = rep(c('BIDWELL','ADELSON'))

Webscraper won't loop from page 2 to page 5

阅读更多关于 Webscraper won't loop from page 2 to page 5

问题 I am using https://www.realtor.com/realestateagents/phoenix_az//pg-2 as my starting point. I want to go from page 2 to page 5 and each page in-between while collecting names and numbers. I am collecting information on page 2 perfectly however I can not get it to go to the next page without having to plug in a new url. I am trying to set up a loop to do this automatically however after coding what I thought would be a loop im just getting the information only on page 2 (the starting point)

Webscraper won't loop from page 2 to page 5

阅读更多关于 Webscraper won't loop from page 2 to page 5

Trouble getting desired response issuing a post requests

阅读更多关于 Trouble getting desired response issuing a post requests

问题 I've created a script in python to get a 200 status code issuing a post http requests but when I run my script I get 403 instead. It seems that I followed the way how the requests is being sent in chrome dev tools. To do it manually - go to that page, select 6 as size and then hit the add to cart button. How can I do the same using the script below? Webpage address I've tried with: import requests from bs4 import BeautifulSoup main_url = 'https://www.footlocker.co.uk/en/homepage' post_url =

Trouble getting desired response issuing a post requests

阅读更多关于 Trouble getting desired response issuing a post requests

Scraping Data from a Tableau Map

阅读更多关于 Scraping Data from a Tableau Map

问题 I am trying to pull locations and names of Naloxone distribution centers in Illinois for a research project on the opioid crisis. This tableau generated dashboard is accessible from here from the department of public health https://idph.illinois.gov/OpioidDataDashboard/ I've tried everything I could find. First changing the url to "download" the data using Tableau's interface. That only let me download a pdf map not the actual dataset behind it. Second, I modified the python script I've seen

How to zoom out page using RSelenium library in R?

阅读更多关于 How to zoom out page using RSelenium library in R?

问题 I am trying to write a web-scraper using RSelenium Library in R. The last step of my work includes taking screenshot of a table on web page. To fit the whole table into the window I should zoom out the web browser (in that case it's firefox). I tried to use: webElem <- remDR$findElement("css", "body") webElem$clickElement() webElem$sendKeysToElement(list(key = "control", "-")) however it doesn't work. I saw also this thread: Zoom out shiny app at default in browser and found there promising

Python download image with lxml

阅读更多关于 Python download image with lxml

问题 I need to find an image in a HTML code similar to this one: ... <a href="/example/1"> <img id="img" src="http://example.net/example.jpg" alt="Example" /> </a> ... I am using lxml and requests. Here is the code: import lxml from lxml import html import requests url = 'http://www.example.com' r = requests.get(url) tree = lxml.html.fromstring(r.content) img = tree.get_element_by_id("img") f = open("image.jpg",'wb') f.write(requests.get(img['src']).content) But i am getting an error: Traceback