Get content between a pair of HTML tags using Bash

前端 未结 6 645
野趣味
野趣味 2020-11-30 09:53

I need to get the HTML contents between a pair of given tags using a bash script. As an example, having the HTML code below:



         


        
6条回答
  •  难免孤独
    2020-11-30 10:57

    Personally I find it very useful to use hxselect command (often with help of hxclean) from package html-xml-utils. The latter fixes (sometimes broken) HTML file to correct XML file and the first one allows to use CSS selectors to get the node(s) you need. With use of the -c option, it strips surrounding tags. All these commands work on stdin and stdout. So in your case you should execute:

    $ hxselect -c body <
      
      
      
        text
        
    text2
    text3
    HTML

    to get what you need. Plain and simple.

提交回复
热议问题