Prevent adding first line when using htmlParse() from 'XML' package

拈花ヽ惹草 提交于 2019-12-13 04:32:15

问题


I have a problem while doing a htmlParse() on a XHTML document.

When it loads into R as an 'externalptr', I can see that one line is added, at the top of the file:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

I don't want to make this line appear because it breaks my application. I would like to delete it within the htmlParse() function, and not having to delete this line manually for each XHTML I have.

Any suggestions? I've tried changing some parameters passed to the function htmlParse() but at this time, after trying with it, I have not found it.

If it helps, here are the first lines of the XHTML I parse:

<?xml version="1.0" encoding="utf-8" ?>
<html dir="ltr" xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xml:lang="es">
<head>
<meta charset="utf-8" />

回答1:


I tried with xmlRoot() and then saved with saveXML(), including as parameters the prefix <?xml version="1.0" encoding="utf-8" ?>

There was also an encoding problem but that's another story. In Windows didn't work, in Ubuntu finally worked.

Thank you all.



来源:https://stackoverflow.com/questions/31906034/prevent-adding-first-line-when-using-htmlparse-from-xml-package

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!