Regex to remove white space between tags in gsub R

断了今生、忘了曾经 提交于 2019-12-13 15:26:52

问题


How to remove white space or tabulation between tags, without removing it from inside the tags, i tried gsub but didn't succeed

gsub("(^>)\\s(^<)", "", x)

Given a string like :

 "<div class=\"panel\">\n   <div class=\"shortcode\">\n\t    <div class=\"article-\"> text text text text </div> \n    </div>\n    </div>"

Desired output:

<div class=\"panel\"><div class=\"shortcode\"><div class=\"article-\"> text text text text </div></div></div>

回答1:


You could try using a look around

gsub("(?<=\\>)(\\s*)(?=\\<)", "", x, perl = TRUE)
## [1] "<div class=\"panel\"><div class=\"shortcode\"><div class=\"article-\"> text text text text </div></div></div>"



回答2:


We can use the fact that the tags have \n between them giving particularly simple solutions:

1) If s is the input string then:

gsub("\\s*\n\\s*", "", s)

(If \t cannot appear within tags as is the case in the question then the pattern could alternately be written as " *[\n\t] *".)

2) Another way is:

paste(sapply(strsplit(s, "\n"), trimws), collapse = "")


来源:https://stackoverflow.com/questions/34706573/regex-to-remove-white-space-between-tags-in-gsub-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!