R not accepting xpath query

后端未结

关注

 4  1359

不知归路 2021-01-28 12:51

Hi I am using the XML package in R to scrape html pages. The page of interest is http://www.ncbi.nlm.nih.gov/protein/225903367?report=fasta and on that page there is a sequence

4条回答

悲&欢浪女 (楼主)

2021-01-28 13:31

@brucezepplin, I feel your frustration. @Mathias Muller, I worked with what you wrote and ran the following:

test <- "http://www.ncbi.nlm.nih.gov/protein/225903367?report=fasta" 
doc <- htmlTreeParse(test, asText = TRUE, useInternalNodes = TRUE) 
xpathSApply(doc, "//div[@id = 'viewercontent1']", xmlValue)
xpathSApply(doc, "//div[@id = 'viewercontent1']//span[@id = 'gi_225903367_1']", xmlValue)
xpathSApply(doc, "//div[@id = 'viewercontent1']/gi/span", xmlValue))

First, when I looked at "doc" it only showed a couple of header lines, not the full page.

But the first xpath returned list(), so at least it was functioning. The next two returned NULL. There is a

before the desired span nodes as well as a >gi.

In short, this is not an answer but perhaps will make it easier for someone else to provide a solution.

0 讨论(0)

查看其它4个回答