web-scraping | 易学教程

How to iterate over divs in Scrapy?

阅读更多关于 How to iterate over divs in Scrapy?

问题 It is propably very trivial question but I am new to Scrapy. I've tried to find solution for my problem but I just can't see what is wrong with this code. My goal is to scrap all of the opera shows from given website. Data for every show is inside one div with class "row-fluid row-performance ". I am trying to iterate over them to retrieve it but it doesn't work. It gives me content of the first div in each iteration(I am getting 19x times the same show, instead of different items). Thanks

webscraping data tables and data from a web page

阅读更多关于 webscraping data tables and data from a web page

问题 I am trying to webscrape real time streaming data tables and data from a web page I tried: library(XML) webpage <- "http://www.investing.com/indices/us-30" tables <- readHTMLTable(webpage ) n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) tables n.rows but I get an error. Thank you for your help. 回答1: I'm actually proud of myself for getting this to work (I have been trying to get my head around these kinds of things for a long time now....) library(rvest) url4 <- "http://www.investing

Getting embbeded Facebook comments from site

阅读更多关于 Getting embbeded Facebook comments from site

问题 I would like to retrieve the embedded Facebook comments from the page web: (http://www.example.com/sub_page_wFBcomments) What I know I can use the Facebook graph API for retrieving facebook comments directly from facebook.com. The same is not true when the comments are embedded in the web site of the facebook page's owner. What I've tried When using the graph API like this: https://graph.facebook.com/v2.7/[apikey]/?key=value&access_token=[MyToken] { "link": "http://www.example.con/", "name":

Web-scrapeing a table to a list

阅读更多关于 Web-scrapeing a table to a list

问题 I'm trying to extract a table from a webpage. I have managed to get all the data in the table into a list. However all the table data is being put into one list element. I need assistance getting the 'clean' data (i.e. the strings, without all the HTML packaging) from the rows of the table into their own list elements. So instead of... list = [<tr> <th><a href="/7.62x25mm_TT_AKBS" title="7.62x25mm TT AKBS"><img alt="TTAKBS.png" decoding="async" height="64" src="https://static.wikia.nocookie

Scrape website's Power BI dashboard using R

阅读更多关于 Scrape website's Power BI dashboard using R

问题 I have been trying to scrape my local government's Power BI dashboard using R but it seems like it might be impossible. I've read from the Microsoft site that it is not possible to scrable Power BI dashboards but I am going through several forums showing that it is possible, however I am going through a loop I am trying to scrape the Zip Code tab data from this dashboard: https://app.powerbigov.us/view?r

Scrape website's Power BI dashboard using R

阅读更多关于 Scrape website's Power BI dashboard using R

rvest Webscraping in R with form inputs

阅读更多关于 rvest Webscraping in R with form inputs

问题 I can't get my head around this problem in R and I would really appreciate if you could leave a piece of advice for me here. I am trying to scrape historical bond yield data from https://www.investing.com/rates-bonds/spain-5-year-bond-yield-historical-data for personal use only (of course). The solution provided here works really well but only goes as far as to scrape the first 24 time stamps of daily data: webscraping data tables and data from a web page What I am trying to achieve is to

How to get ETF Financial information (e.g. NAV) from Yahoo (with Quantmod)?

阅读更多关于 How to get ETF Financial information (e.g. NAV) from Yahoo (with Quantmod)?

问题 I know that I can use the quantmod package to get stock financial information easily from yahoo. For example, if I want to get the Volume, P/E ratio and Dividend Yield: > library(quantmod) > AAPL <- getSymbols("AAPL") Warning message: In download.file(paste(yahoo.URL, "s=", Symbols.name, "&a=", from.m, : downloaded length 167808 != reported length 200 > what_metrics <- yahooQF(c("Name", + "Volume", + "P/E Ratio", + "Dividend Yield" + + )) > > getQuote(AAPL, what=what_metrics) Trade Time Name

How to get ETF Financial information (e.g. NAV) from Yahoo (with Quantmod)?

阅读更多关于 How to get ETF Financial information (e.g. NAV) from Yahoo (with Quantmod)?

Anyway to scrape a link that redirects?

阅读更多关于 Anyway to scrape a link that redirects?

问题 Is there anyway that I can make python click a link such as a bit.ly link and then scrape the resulting link? When I am scraping a certain page, the only link I can scrape is a link that redirects, where it redirects to is where the information I need is located. 回答1: There are 3 types of redirections HTTP - as information in response headers (with code 301, 302, 3xx) HTML - as tag <meta> in HTML (wikipedia: Meta refresh) JavaScript - as code like window.location = new_url requests execute