R: LinkedIn scraping using rvest

雨燕双飞 提交于 2019-12-23 03:24:06

问题


Using rvest package, I am trying to scrape data from my LinkedIn profile.

These attempts:

library(rvest)
url = "https://www.linkedin.com/profile/view?id=AAIAAAFqgUsBB2262LNIUKpTcr0cF_ekoX9ZJh0&trk=nav_responsive_tab_profile"
li = read_html(url)
html_nodes(li, "#experience-316254584-view span.field-text")
html_nodes(li, xpath='//*[@id="experience-610617015-view"]/p/span/text()')

don't find any nodes:

#> {xml_nodeset (0)}

Q: How to return just the text?

#> "Quantitative hedge fund manager selection for $650m portfolio of alternative investments"

EDIT:

LinkedIn has an API, however for some reason, below returns only the first two positions of experience, no other items (like education, projects). Hence the scraping approach.

library("Rlinkedin")
auth = inOAuth(application_name, consumer_key, consumer_secret)
getProfile(auth, connections = FALSE, id = NULL) # returns very limited data

回答1:


You are making things unnecessarily difficult... All you need to do is issue a GET request to https://api.linkedin.com/v1/people/~?format=json after obtaining an OAuth 2.0 token from Linkedin. In R, you can do this using jsonlite:

library(jsonlite)
linkedin <- fromJSON('https://api.linkedin.com/v1/people/~?format=json')
position <- linkedin$headline

You must have the 'r_basicprofile' member permission on your oauth token.



来源:https://stackoverflow.com/questions/33457650/r-linkedin-scraping-using-rvest

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!