How to append a newline after every match using xmlint --xpath

后端未结

关注

 4  1050

别跟我提以往 2021-01-02 01:15

I have the following HTML code:

1</script>
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-5408099190056760"
     data-ad-slot="7305827575"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script>        </div>
      </div>
      
      <div class="fly-panel detail-box" id="flyReply">
        <fieldset class="layui-elem-field layui-field-title" style="text-align: center;">
          <legend>4条回答</legend>        </fieldset>

        <ul class="jieda" id="jieda">
                    <li data-id="111" class="jieda-daan">
            <a name="item-1111111111"></a>
            <div class="detail-about detail-about-reply">
                         <a class="fly-avatar" href="">
                <img src="https://www.e-learn.cn/qa/data/avatar/000/00/00/small_000000067.jpg" alt=" 天命终不由人 ">
              </a>
              <div class="fly-detail-user">
                <a href="" class="fly-link">
                  <cite> 天命终不由人</cite>
                                             
                </a>
                
                <span>(楼主)</span>
            
              </div>              <div class="detail-hits">
                <span>2021-01-02 01:39</span>
              </div>

            </div>
            <div class="detail-body jieda-body photos">
              <p>          
<p>Hello from the year 2020!</p>

<p>As of v2.9.9 of libxml, this behavior has been fixed in xmllint itself.</p>

<p>However, if you're using anything older than that, and don't want to build libxml from source just to get the fixed <code>xmllint</code>, you'll need one of the other workarounds here. As of this writing, the latest CentOS 8, for example, is still using a version of libxml (2.9.7) that behaves the way the OP describes.</p>

<p>As I gather from this SO answer, it's theoretically possible to feed a command into the <code>--shell</code> option of older (<2.9.9) versions of <code>xmllint</code>, and this will produce each node on a separate line. However, you end up having to post-process it with <code>sed</code> or <code>grep</code> to remove the visual detritus of shell mode's (human-oriented) output. It's not ideal.</p>

<hr>

<p>XMLStarlet, if available, offers another solution, but you do need to use <code>xmlstarlet fo</code> to format your HTML fragment into valid XML before using <code>xmlstarlet sel</code> to extract nodes:</p>

<pre><code>echo '
<textarea name="command" class="setting-input fixed-width"
 rows="9">1
2' \
  | xmlstarlet fo -H -R \
  | xmlstarlet sel -T -t -v '//textarea[@name="command"]' -n

If the Attempt to load network entity message from the second xmlstarlet invocation annoys you, just add 2>/dev/null at the very end to suppress it (at the risk of suppressing other messages printed to standard error).

The XMLStarlet options explained (see also the user's guide):

fo -H -R — format the output, expecting HTML input, and recovering as much bad input as possible
- this will add an root node, making the fragment in the OP's example valid XML
sel -T -t -v //xpath -n — select nodes based on XPath //xpath
- output plain text (-T) instead of XML
- using the given template (-t) that returns the value (-v) of the node rather than the node itself (allowing you to forgo using text() in the XPath expression)
- finally, add a newline (-n)

Edit(s): Removed half-implemented xmllint --shell solution because it was just bad. Added an XMLStarlet example that actually works with the OP's data.

0 讨论(0)

查看其它4个回答