web-scraping

rvest function html_nodes returns {xml_nodeset (0)}

余生长醉 提交于 2021-02-10 06:13:05
问题 I am trying to scrape data frame the following website http://stats.nba.com/game/0041700404/playbyplay/ I'd like to create a table that includes the date of the game, the scores throughout the game, and the team names I am using the following code: game1 <- read_html("http://stats.nba.com/game/0041700404/playbyplay/") #Extracts the Date html_nodes(game1, xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "game-summary-team--vtm", " " ))]//*[contains(concat( " ", @class, " " ),

rvest function html_nodes returns {xml_nodeset (0)}

こ雲淡風輕ζ 提交于 2021-02-10 06:12:28
问题 I am trying to scrape data frame the following website http://stats.nba.com/game/0041700404/playbyplay/ I'd like to create a table that includes the date of the game, the scores throughout the game, and the team names I am using the following code: game1 <- read_html("http://stats.nba.com/game/0041700404/playbyplay/") #Extracts the Date html_nodes(game1, xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "game-summary-team--vtm", " " ))]//*[contains(concat( " ", @class, " " ),

Scrape tables by passing multiple search requests using R

馋奶兔 提交于 2021-02-10 05:27:08
问题 I'm trying to search for multiple times on a website using First and last name (https://npiregistry.cms.hhs.gov/registry/) and then create a dataframe of the output I figured out that this is similar to what has been described in How to automate multiple requests to a web search form using R, but for some reasons I've been getting the error "Error: failed to load external entity"` Below is the code that I'm using to pull records fn = rep(c('HARVEY','HARVEY')); ln = rep(c('BIDWELL','ADELSON'))

Webscraper won't loop from page 2 to page 5

六月ゝ 毕业季﹏ 提交于 2021-02-10 05:13:23
问题 I am using https://www.realtor.com/realestateagents/phoenix_az//pg-2 as my starting point. I want to go from page 2 to page 5 and each page in-between while collecting names and numbers. I am collecting information on page 2 perfectly however I can not get it to go to the next page without having to plug in a new url. I am trying to set up a loop to do this automatically however after coding what I thought would be a loop im just getting the information only on page 2 (the starting point)

Webscraper won't loop from page 2 to page 5

让人想犯罪 __ 提交于 2021-02-10 05:10:57
问题 I am using https://www.realtor.com/realestateagents/phoenix_az//pg-2 as my starting point. I want to go from page 2 to page 5 and each page in-between while collecting names and numbers. I am collecting information on page 2 perfectly however I can not get it to go to the next page without having to plug in a new url. I am trying to set up a loop to do this automatically however after coding what I thought would be a loop im just getting the information only on page 2 (the starting point)

Trouble getting desired response issuing a post requests

强颜欢笑 提交于 2021-02-10 03:26:42
问题 I've created a script in python to get a 200 status code issuing a post http requests but when I run my script I get 403 instead. It seems that I followed the way how the requests is being sent in chrome dev tools. To do it manually - go to that page, select 6 as size and then hit the add to cart button. How can I do the same using the script below? Webpage address I've tried with: import requests from bs4 import BeautifulSoup main_url = 'https://www.footlocker.co.uk/en/homepage' post_url =

Trouble getting desired response issuing a post requests

大兔子大兔子 提交于 2021-02-10 03:25:26
问题 I've created a script in python to get a 200 status code issuing a post http requests but when I run my script I get 403 instead. It seems that I followed the way how the requests is being sent in chrome dev tools. To do it manually - go to that page, select 6 as size and then hit the add to cart button. How can I do the same using the script below? Webpage address I've tried with: import requests from bs4 import BeautifulSoup main_url = 'https://www.footlocker.co.uk/en/homepage' post_url =

Scraping Data from a Tableau Map

廉价感情. 提交于 2021-02-09 08:46:27
问题 I am trying to pull locations and names of Naloxone distribution centers in Illinois for a research project on the opioid crisis. This tableau generated dashboard is accessible from here from the department of public health https://idph.illinois.gov/OpioidDataDashboard/ I've tried everything I could find. First changing the url to "download" the data using Tableau's interface. That only let me download a pdf map not the actual dataset behind it. Second, I modified the python script I've seen

How to zoom out page using RSelenium library in R?

痴心易碎 提交于 2021-02-08 20:38:29
问题 I am trying to write a web-scraper using RSelenium Library in R. The last step of my work includes taking screenshot of a table on web page. To fit the whole table into the window I should zoom out the web browser (in that case it's firefox). I tried to use: webElem <- remDR$findElement("css", "body") webElem$clickElement() webElem$sendKeysToElement(list(key = "control", "-")) however it doesn't work. I saw also this thread: Zoom out shiny app at default in browser and found there promising

Python download image with lxml

情到浓时终转凉″ 提交于 2021-02-08 15:48:11
问题 I need to find an image in a HTML code similar to this one: ... <a href="/example/1"> <img id="img" src="http://example.net/example.jpg" alt="Example" /> </a> ... I am using lxml and requests. Here is the code: import lxml from lxml import html import requests url = 'http://www.example.com' r = requests.get(url) tree = lxml.html.fromstring(r.content) img = tree.get_element_by_id("img") f = open("image.jpg",'wb') f.write(requests.get(img['src']).content) But i am getting an error: Traceback