How to get data from Wikipedia page using WikipediR package in R?

和自甴很熟 提交于 2021-02-08 06:33:09

问题


I need to fetch a certain part of data from multiple Wikipedia pages. How can I do that using WikipediR package? Or is there some other better option for the same. To be precise, I need only the below marked part from all the pages.

Wikipedia page on Sachin Tendulkar

How can I get that? Any help would be appreciated.


回答1:


Can you be a little more specific as to what you want? Here's a simple way to import data from the web, and specifically from Wikipedia.

library(rvest)    
scotusURL <- "https://en.wikipedia.org/wiki/List_of_Justices_of_the_Supreme_Court_of_the_United_States"

## ********************
## Option 1: Grab the tables from the page and use the html_table function to extract the tables you're interested in.

temp <- scotusURL %>% 
  html %>%
  html_nodes("table")

html_table(temp[1]) ## Just the "legend" table
html_table(temp[2]) ## THE MAIN TABLE

Now, if you want to import data from multiple pages that have essentially the same structure, but maybe just change by some number or something, please try this method.

library(RCurl);library(XML)

pageNum <- seq(1:10)
url <- paste0("http://www.totaljobs.com/JobSearch/Results.aspx?Keywords=Leadership&LTxt=&Radius=10&RateType=0&JobType1=CompanyType=&PageNum=") 
urls <- paste0(url, pageNum) 

allPages <- lapply(urls, function(x) getURLContent(x)[[1]])
xmlDocs <- lapply(allPages, function(x) XML::htmlParse(x))


来源:https://stackoverflow.com/questions/31330540/how-to-get-data-from-wikipedia-page-using-wikipedir-package-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!