Non greedy text matching and extrapolating in bash

前提是你 提交于 2020-07-04 02:05:12

问题


I have a utility that can parse the attributes within a large xml tag, such as

<mytag name="value" name="value"../>

I have written the utility in perl. It uses basic non-greedy text matching, like this:

while(<XML>)
{
    if (/name=\"(.+?)\"/)
    {
        print("Value is: $1\n");
    }

I know the names of the name/value pairs I want to extract, so I hardcode them into the utility.
This works for me quite well...how can I do this in bash? I have tried using awk but it is not working for when I have an attribute like name="value with spaces in the middle".
With awk, I need to know the index of the name/value pair i want to extract- such as awk { print $3 } for my third set. This works fine if the values do not have spaces in the middle, but if they do, it fails and only gives me the value up until a whitespace character.

Anything would be helpful, thanks.


回答1:


In simple cases, it is really easy to avoid using non-greedy matches (which are not available in many standard utilities, such as sed, awk and grep).

For example, the "regular" expression (with non-greedy match):

 name=\"(.+?)\"

Is exactly the same as the ordinary regular expression

name=\"([^"]+)\"

Although it is possible that what you really want is:

name=\"([^"]*)\"

This is not so easy if the termination of the internal repeated pattern is not a single character, but there is always a regular expression which will work. For example, C-style comments can be recognized with the non-greedy match:

/[*].*?[*]/

and the regular expression

/[*][^*]*[*]+([^*/][^*]*[*]+)*/

which is a bit hard to read but should work just fine. (I prefer [*] to \* but they both do the same thing.)



来源:https://stackoverflow.com/questions/25981398/non-greedy-text-matching-and-extrapolating-in-bash

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!