How to append a newline after every match using xmlint --xpath

与世无争的帅哥 提交于 2020-02-20 07:26:45

问题


I have the following HTML code:

<textarea name="command" class="setting-input   fixed-width" rows="9">1</textarea><textarea name="command" class="setting-input   fixed-width" rows="5">2</textarea>

I would like to parse it to receive such output:

1
2

Currently I am using:

xmllint --xpath '//textarea[@name="command"]/text()' --html

but it does not append a newline after each match.


回答1:


Hello from the year 2020!

As of v2.9.9 of libxml, this behavior has been fixed in xmllint itself.

However, if you're using anything older than that, and don't want to build libxml from source just to get the fixed xmllint, you'll need one of the other workarounds here. As of this writing, the latest CentOS 8, for example, is still using a version of libxml (2.9.7) that behaves the way the OP describes.

As I gather from this SO answer, it's theoretically possible to feed a command into the --shell option of older (<2.9.9) versions of xmllint, and this will produce each node on a separate line. However, you end up having to post-process it with sed or grep to remove the visual detritus of shell mode's (human-oriented) output. It's not ideal.


XMLStarlet, if available, offers another solution, but you do need to use xmlstarlet fo to format your HTML fragment into valid XML before using xmlstarlet sel to extract nodes:

echo '
<textarea name="command" class="setting-input fixed-width"
 rows="9">1</textarea>
<textarea name="command" class="setting-input fixed-width"
 rows="5">2</textarea>' \
  | xmlstarlet fo -H -R \
  | xmlstarlet sel -T -t -v '//textarea[@name="command"]' -n

If the Attempt to load network entity message from the second xmlstarlet invocation annoys you, just add 2>/dev/null at the very end to suppress it (at the risk of suppressing other messages printed to standard error).

The XMLStarlet options explained (see also the user's guide):

  • fo -H -Rformat the output, expecting HTML input, and recovering as much bad input as possible
    • this will add an <html> root node, making the fragment in the OP's example valid XML
  • sel -T -t -v //xpath -nselect nodes based on XPath //xpath
    • output plain text (-T) instead of XML
    • using the given template (-t) that returns the value (-v) of the node rather than the node itself (allowing you to forgo using text() in the XPath expression)
    • finally, add a newline (-n)

Edit(s): Removed half-implemented xmllint --shell solution because it was just bad. Added an XMLStarlet example that actually works with the OP's data.




回答2:


Try this patch, which provides 2 options:

  • --xpath: same as old --xpath, with nodes separated by \n.

  • --xpath0: same as old --xpath, with nodes separated by \0.

Test input (a.html):

<textarea name="command" class="setting-input   fixed-width" rows="9">1</textarea><textarea name="command" class="setting-input   fixed-width" rows="5">2</textarea>

Test command 1:

# xmllint --xpath '//textarea[@name="command"]/text()' --html a.html

Test output 1:

 1
 2

Test command 2:

# xmllint --xpath0 '//textarea[@name="command"]/text()' --html a.html | xargs -0 -n1

Test output 2:

 1
 2



回答3:


I did the following, ugly trick, please feel free to provide a better solution.

Changed the HTML code by replacing </textarea> with \n</textarea> using the following command:

sed 's/\<\/textarea/\'$'\n\<\\/textarea/g' f


来源:https://stackoverflow.com/questions/18532948/how-to-append-a-newline-after-every-match-using-xmlint-xpath

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!