There are good answers on SO about how to use readHTMLTable from the XML package and I did that with regular http pages, however I am not able to solve my problem with https
Using Andrie's great way to get past the https
a way to get at the data without readHTMLTable is also below.
A table in HTML may have an ID. In this case the table has one nice one and the XPath in getNodeSet function does it nicely.
# Define certicificate file
cafile <- system.file("CurlSSL", "cacert.pem", package = "RCurl")
# Read page
page <- GET(
"https://ned.nih.gov/",
path="search/ViewDetails.aspx",
query="NIHID=0010121048",
config(cainfo = cafile, ssl.verifypeer = FALSE)
)
h = htmlParse(page)
ns <- getNodeSet(h, "//table[@id = 'ctl00_ContentPlaceHolder_dvPerson']")
ns
I still need to extract the IDs behind the hyperlinks.
for example instead of collen baros as manager, I need to get to the ID 0010080638
Manager:Colleen Barros