Non greedy text matching and extrapolating in bash

问题

I have a utility that can parse the attributes within a large xml tag, such as

<mytag name="value" name="value"../>

I have written the utility in perl. It uses basic non-greedy text matching, like this:

while(<XML>)
{
    if (/name=\"(.+?)\"/)
    {
        print("Value is: $1\n");
    }

I know the names of the name/value pairs I want to extract, so I hardcode them into the utility.
This works for me quite well...how can I do this in bash? I have tried using awk but it is not working for when I have an attribute like name="value with spaces in the middle".
With awk, I need to know the index of the name/value pair i want to extract- such as awk { print $3 } for my third set. This works fine if the values do not have spaces in the middle, but if they do, it fails and only gives me the value up until a whitespace character.

Anything would be helpful, thanks.

回答1:

In simple cases, it is really easy to avoid using non-greedy matches (which are not available in many standard utilities, such as sed, awk and grep).

For example, the "regular" expression (with non-greedy match):

 name=\"(.+?)\"

Is exactly the same as the ordinary regular expression

name=\"([^"]+)\"

Although it is possible that what you really want is:

name=\"([^"]*)\"

This is not so easy if the termination of the internal repeated pattern is not a single character, but there is always a regular expression which will work. For example, C-style comments can be recognized with the non-greedy match:

/[*].*?[*]/

and the regular expression

/[*][^*]*[*]+([^*/][^*]*[*]+)*/

which is a bit hard to read but should work just fine. (I prefer [*] to \* but they both do the same thing.)

来源：https://stackoverflow.com/questions/25981398/non-greedy-text-matching-and-extrapolating-in-bash

标签

bash

perl