How do I wrap HTML untagged text with <p> tag using Nokogiri?

邮差的信 提交于 2019-12-30 06:55:20

问题


I have to parse an HTML document into different new files. The problem is that there are text nodes which have not been wrapped with "<p>" tags, instead they having "<br>" tags at the end of each paragraph.

I want to wrap this text with <p> tags using Nokogiri:

<div id="f15"><b>Footnote 15</b>: Catullus iii, 12.</div>
<div class="pgmonospaced pgheader"><br/>
<br/>
End of the Project abc<br/>
<br/>
*** END OF THIS PROJECT XYZ ***<br/>
<br/>
***** This file should be named new file.html... *****<br/>
<br/></div>

回答1:


After searching around some forums and doing some debugging locally, i have found the following solution to my problem.

html_doc = Nokogiri::HTML.parse('path/to/html_file')
html_doc
html_doc.search("//br/preceding-sibling::text()|//br/following-sibling::text()").each do |node|
    node.replace(Nokogiri.make("<p>#{node.to_html}</p>"))
end


来源:https://stackoverflow.com/questions/8937846/how-do-i-wrap-html-untagged-text-with-p-tag-using-nokogiri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!