xml2

How to get HTML element that is before a certain class?

大憨熊 提交于 2021-02-20 04:30:28
问题 I'm scraping and having trouble getting the element of the “th” tag that comes before the other “th” element that contains the “type2” class. I prefer to take it by identifying that it is the element "th" before the "th" with class "type2" because my HTML has a lot of "th" and that was the only difference I found between the tables. Using rvest or xml2 (or other R package), can I get this parent? The content which I want is "text_that_I_want". Thank you! <tr> <th class="array">text_that_I

How to get HTML element that is before a certain class?

给你一囗甜甜゛ 提交于 2021-02-20 04:29:20
问题 I'm scraping and having trouble getting the element of the “th” tag that comes before the other “th” element that contains the “type2” class. I prefer to take it by identifying that it is the element "th" before the "th" with class "type2" because my HTML has a lot of "th" and that was the only difference I found between the tables. Using rvest or xml2 (or other R package), can I get this parent? The content which I want is "text_that_I_want". Thank you! <tr> <th class="array">text_that_I

Error message when installing xml2 R package

南笙酒味 提交于 2021-02-04 08:13:06
问题 After updating to R 4.0.0 on my Windows machine, I can't install some packages such as xml2 (the same goes for foreign and nnet ). When I try to install I get this error message: * installing *source* package 'foreign' ... ** package 'foreign' successfully unpacked and MD5 sums checked ** using staged installation ** libs *** arch - i386 "c:/rtools40/mingw32/bin/"gcc -I"C:/PROGRA~1/R/R-40~1.0/include" -DNDEBUG -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c R_systat.c -o R_systat.o

Efficiently transform XML to data frame

隐身守侯 提交于 2021-01-29 18:51:02
问题 I need to transform some vanilla xml into a data frame. The XML is a simple representation of rectangular data (see example below). I can achieve this pretty straightforwardly in R with xml2 and a couple of for loops. However, I'm sure there is a much better/faster way (purrr?). The XML I will be ultimately working with are very large, so more efficient methods are preferred. I would be grateful for any advice from the community. library(tidyverse) library(xml2) demo_xml <- "<DEMO> <EPISODE>

Why is “link” faster than “//link” in XPath?

北战南征 提交于 2021-01-28 07:35:47
问题 Given this XML, library(xml2) text = paste0( '<?xml version="1.0" encoding="UTF-8"?><items>', paste(rep( '<item type="greeting" id="9273938"> <link type="1" id="139" value="Hi"/> <link type="1" id="142" value="Hello"/> </item>', 100), collapse = "\n"), '</items>') x = xml_children(read_xml(text)) I can select all the link nodes by using "link" or "//link" and get the same result – but with very different speed: bench::mark( link = xml_find_all(x, "link"), `//link` = xml_find_all(x, "//link"))

xml_find_all function from xml2 package (R) does not find relevant nodes

蓝咒 提交于 2021-01-02 08:31:53
问题 I am using the xml2 package in R to access xml data, and found that it behaves different on different xml_documents. On this pet example library(xml2) doc <- read_xml( "<MEMBERS> <CUSTOMER> <ID>178</ID> <FIRST.NAME>Alvaro</FIRST.NAME> <LAST.NAME>Juarez</LAST.NAME> <ADDRESS>123 Park Ave</ADDRESS> <ZIP>57701</ZIP> </CUSTOMER> <CUSTOMER> <ID>934</ID> <FIRST.NAME>Janette</FIRST.NAME> <LAST.NAME>Johnson</LAST.NAME> <ADDRESS>456 Candy Ln</ADDRESS> <ZIP>57701</ZIP> </CUSTOMER> </MEMBERS>") doc {xml

r rvest error: “Error in doc_namespaces(doc) : external pointer is not valid”

删除回忆录丶 提交于 2020-04-16 05:42:12
问题 My question is similar to this one, but the latter did not receive an answer I can work with. I am scraping thousands of urls with xml2::read_html . This works fine. But when I try and parse the resulting html documents using purrr::map_df and html_nodes , I get the following error: Error in doc_namespaces(doc) : external pointer is not valid For some reason, I am unable to reproduce the error using examples. The example below is not good, because it works totally fine. But if someone could

How to configure the curl package in R with default web proxy settings?

我怕爱的太早我们不能终老 提交于 2020-02-24 10:17:33
问题 I'm using R in a commercial environment where external connectivity all goes via a web proxy, so we need to specify the proxy server address and ensure we connect to it with Windows authentication. I already have code that will configure the RCurl and httr packages to use those settings by default - i.e. httr::set_config(config( proxy = "my.proxy.address", proxyuserpwd = ":", proxyauth = 4 )) or opts <- list( proxy = "my.proxy.address", proxyuserpwd = ":", proxyauth = 4 ) RCurl::options

How to configure the curl package in R with default web proxy settings?

我与影子孤独终老i 提交于 2020-02-24 10:15:11
问题 I'm using R in a commercial environment where external connectivity all goes via a web proxy, so we need to specify the proxy server address and ensure we connect to it with Windows authentication. I already have code that will configure the RCurl and httr packages to use those settings by default - i.e. httr::set_config(config( proxy = "my.proxy.address", proxyuserpwd = ":", proxyauth = 4 )) or opts <- list( proxy = "my.proxy.address", proxyuserpwd = ":", proxyauth = 4 ) RCurl::options