问题
Using rvest
package, I am trying to scrape data from my LinkedIn profile.
These attempts:
library(rvest)
url = "https://www.linkedin.com/profile/view?id=AAIAAAFqgUsBB2262LNIUKpTcr0cF_ekoX9ZJh0&trk=nav_responsive_tab_profile"
li = read_html(url)
html_nodes(li, "#experience-316254584-view span.field-text")
html_nodes(li, xpath='//*[@id="experience-610617015-view"]/p/span/text()')
don't find any nodes:
#> {xml_nodeset (0)}
Q: How to return just the text?
#> "Quantitative hedge fund manager selection for $650m portfolio of alternative investments"
EDIT:
LinkedIn has an API, however for some reason, below returns only the first two positions of experience, no other items (like education, projects). Hence the scraping approach.
library("Rlinkedin")
auth = inOAuth(application_name, consumer_key, consumer_secret)
getProfile(auth, connections = FALSE, id = NULL) # returns very limited data
回答1:
You are making things unnecessarily difficult... All you need to do is issue a GET request to https://api.linkedin.com/v1/people/~?format=json after obtaining an OAuth 2.0 token from Linkedin. In R, you can do this using jsonlite:
library(jsonlite)
linkedin <- fromJSON('https://api.linkedin.com/v1/people/~?format=json')
position <- linkedin$headline
You must have the 'r_basicprofile' member permission on your oauth token.
来源:https://stackoverflow.com/questions/33457650/r-linkedin-scraping-using-rvest