rvest

Using rvest to scrape a website - Selecting html node?

与世无争的帅哥 提交于 2019-12-08 13:57:32
问题 I have a question about my latest r vest scrape. I want to scrape this page (and some other stocks as well): http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1 I need a list of the Market Capital, which is the first box in the second line. This list should contain approx 50-100 stocks. I am using rvest for that. library(rvest) html = read_html("http://www.finviz.com/quote.ashx?t=A") cast = html_nodes(html, "table-dark-row") The problem is, I can not get around the html_nodes. Any idea about

Scrape with a loop and avoid 404 error

喜夏-厌秋 提交于 2019-12-08 12:11:05
问题 I am trying to scrape wiki for certain astronomy related definitions for my project. The code works pretty well, but I am not able to avoid 404s. I tried tryCatch . I think I am missing something here. I am looking for a way overcome 404s while running a loop. Here is my code: library(rvest) library(httr) library(XML) library(tm) topic<-c("Neutron star", "Black hole", "sagittarius A") for(i in topic){ site<- paste("https://en.wikipedia.org/wiki/", i) site <- read_html(site) stats<- xmlValue

Web scraping password protected website using R

孤街醉人 提交于 2019-12-08 11:31:42
问题 i would like to web scrap yammer data using R,but in order to do so first il have to login to this page,(which is authentication for an app that i created). https://www.yammer.com/dialog/authenticate?client_id=iVGCK1tOhbZGS7zC8dPjg I am able to get the yammer data once i login to this page but all this is in browser by standard yammer urls (https://www.yammer.com/api/v1/messages/received.json) I have read through similar questions and tried the suggestions but still cant get through this

Scraping a JavaScript object and converting to JSON within R/Rvest

我是研究僧i 提交于 2019-12-08 11:23:53
问题 I am scraping the following website: https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio I am trying to get the table of currency exchange rates into an R data frame via the rvest package, but the table itself is configured in a JavaScript variable within the HTML code. I located the relevant css selector and now I have this: library(rvest) banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>% read_html() %>% html_nodes('#indicadores

Rvest How to avoid the Error in open.connection(x, “rb”) : HTTP error 404 R

这一生的挚爱 提交于 2019-12-08 03:26:19
问题 I'd like to take some informations from a list of website. I have a list of urls, but there are some that doesn't work/exesist. The Error is: Error in open.connection(x, "rb") : HTTP error 404 R library(Rvest) url_web<-(c("https://it.wikipedia.org/wiki/Roma", "https://it.wikipedia.org/wiki/Milano", "https://it.wikipedia.org/wiki/Napoli", "https://it.wikipedia.org/wiki/Torinoooo", # for example this is an error "https://it.wikipedia.org/wiki/Palermo", "https://it.wikipedia.org/wiki/Venezia"))

With rvest, how to extract html contents from the object returned by submit_form()

妖精的绣舞 提交于 2019-12-08 02:43:47
问题 I am trying to download some traffic data from pems.dot.ca.gov, following this topic. rm(list=ls()) library(rvest) library(xml2) library(httr) url <- "http://pems.dot.ca.gov/?report_form=1&dnode=tmgs&content=tmg_volumes&tab=tmg_vol_ts&export=&tmg_station_id=74250&s_time_id=1369094400&s_time_id_f=05%2F21%2F2013&e_time_id=1371772740&e_time_id_f=06%2F20%2F2013&tod=all&tod_from=0&tod_to=0&dow_5=on&dow_6=on&tmg_sub_id=all&q=obs_flow&gn=hour&html.x=34&html.y=8" pgsession <- html_session(url) pgform

How to fetch headlines from google news using rvest R?

一曲冷凌霜 提交于 2019-12-08 02:14:58
问题 I want to fetch headlines from google news using rvest in R. I have done this so far library(rvest) url=read_html("https://www.google.com/search?hl=en&tbm=nws&authuser=0&q=american+president") selector_name<-"r" fnames<-html_nodes(x = url, css = selector_name) %>% html_text() but the result is > fnames character(0) This is the inspect element of a headline? <h3 class="r"><a href="/browse.php/PbtvpluS/QDvUJpC7/KoWCA9QE/VTTOFmVJ/bIp8sMa8/qKjgkcAu/Hgcr9lyg/4bibGCOO/nZ82ojLo/_2B602Vo/0sOSEbba

Web Scrape: Select Fields from Drop Downs, Extract Resulting Data

喜你入骨 提交于 2019-12-08 01:01:04
问题 Try to do some webscraping in R and could use some help. I would like to extract the data in the table at this page http://droughtmonitor.unl.edu/MapsAndData/DataTables.aspx But I would like to first select County from the left most drop down, then select Alameda County (CA) from the next dropdown, then scrape the data in the table. This is what I have so far, but I think I know why its not working - rvest form functions are suited to filling out a basic form not selecting from drop downs on

Web scraping the data behind every url from a list of urls

主宰稳场 提交于 2019-12-07 18:01:25
I am trying to gather a dataset from this site called ICObench . I've managed to extract the names of each ICO in the 91 pages using rvest and purrr, but Im confused as to how I can extract data behind each name in the list. All the names are clickable links. This is the code so far: url_base <- "https://icobench.com/icos?page=%d&filterBonus=&filterBounty=&filterTeam=&filterExpert=&filterSort=&filterCategory=all&filterRating=any&filterStatus=ended&filterCountry=any&filterRegistration=0&filterExcludeArea=none&filterPlatform=any&filterCurrency=any&filterTrading=any&s=&filterStartAfter=

Downloading a file after login using a https URL

孤人 提交于 2019-12-07 07:05:36
问题 I am trying to download an excel file, which I have the link to, but I am required to log in to the page before I can download the file. I have successfully passed the login page with rvest, rcurl and httr, but I am having an extremely difficult time downloading the file after I have logged in. url <- "https://website.com/console/login.do" download_url <- "https://website.com/file.xls" session <- html_session(url) form <- html_form(session)[[1]] filled_form <- set_values(form, userid = user,