How to extract from a file text between tokens using bash scripts

纵然是瞬间 提交于 2019-12-22 12:36:35

问题


I was reading this question: Extract lines between 2 tokens in a text file using bash because I have a very similar problem... I have to extract (and save it to $variable before printing) text in this xml file:

<--more labels up this line>
<ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
<--more labels and text down this line-->

I only need to get the value= (obviously without brackets and no 'value='), but first, I think it have to search "GUI/LastVMSelected" to get to this line, because there could be a similar value field in other lines,and the value of that label is that i want.


回答1:


If they are on the same line (as they seem to be from your example), it's even easier. Just:

sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*/\1/p'

Explanation:

  • -n: Suppress default print
  • /name="GUI\/LastVMSelected"/: only lines matching this pattern
  • s/.value="([^"])"./\1/p
    • substitute everything, capturing the parenthesized part (the value of value)
    • and print the result



回答2:


I'm assuming that you're extracting from an XML document. If that is the case, have a look at the XMLStarlet command-line tools for processing XML. There's some documentation for querying XML docs here.




回答3:


Use this:

for f in `grep "GUI/LastVMSelected" filename.txt | cut -d " " -f3`; do echo ${f:7:36}; done
  • grep gets you only the lines you need
  • cut splits the lines using some separator, and returns the Nth result of the split
  • -d " " sets the separator to space
  • -f3 returns the third result (1-based indexing)
  • ${f:7:36} extracts the substring starting at index 7 that is 36 characters long. This gets rid of the leading value=" and trailing slash, etc.

Obviously if the order of the fields changes, this will break, but if you're just after something quick and dirty that works, this should be it.




回答4:


Using my answer from the question you linked:

sed -n '/<!--more labels up this line-->/{:a;n;/<!--more labels and text down this line-->/b;\|GUI/LastVMSelected|s/value="\([^=]*\)"/\1/p;ba}' inputfile

Explanation:

  • -n - don't do an implicit print
  • /<!-- this is token 1 -->/{ - if the starting marker is found, then
    • :a - label "a"
      • n - read the next line
      • /<!-- this is token 2 -->/q - if it's the ending marker, quit
      • \|GUI/LastVMSelected| - if the line matches the string
        • s/value="\([^"]*\)"/\1/p - print the string after 'value=' and before the next quote
    • ba - branch to label "a"
  • } end if


来源:https://stackoverflow.com/questions/4860228/how-to-extract-from-a-file-text-between-tokens-using-bash-scripts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!