Matching optional parameters with non-capturing groups in Bash regular expression

与世无争的帅哥 提交于 2020-12-26 06:34:18

问题


I want to parse strings similar to the following into separate variables using regular expressions from within Bash:

Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";

or

Category: resource;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Resource";rel="http://schemas.ogf.org/occi/core#entity";attributes="occi.core.summary";

The first part before "title" is common to all strings, the parts title and attributes are optional.

I managed to extract the mandatory parameters common to all strings, but I have trouble with optional parameters not necessarily present for all strings. As far as I found out, Bash doesn't support Non-capturing parentheses which I would use for this purpose.

Here is what I achieved thus far:

CATEGORY_REGEX='Category:\s*([^;]*);scheme="([^"]*)";class="([^"]*)";'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}

The regular expression I would like to use (and which is working for me in Ruby) would be:

CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(?:title="([^"]*)";)?\s*(?:rel="([^"]*)";)?\s*(?:location="([^"]*)";)?\s*(?:attributes="([^"]*)";)?\s*(?:actions="([^"]*)";)?'

Is there any other solution to parse the string with command line tools without having to fall back on perl, python or ruby?


回答1:


I don't think non-capturing groups exist in bash regex, so your options are to use a scripting language or to remove the ?: from all of the (?:...) groups and just be careful about which groups you reference, for example:

CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(title="([^"]*)";)?\s*(rel="([^"]*)";)?\s*(location="([^"]*)";)?\s*(attributes="([^"]*)";)?\s*(actions="([^"]*)";)?'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo "full:       ${BASH_REMATCH[0]}"
echo "category:   ${BASH_REMATCH[1]}"
echo "scheme:     ${BASH_REMATCH[2]}"
echo "class:      ${BASH_REMATCH[3]}"
echo "title:      ${BASH_REMATCH[5]}"
echo "rel:        ${BASH_REMATCH[7]}"
echo "location:   ${BASH_REMATCH[9]}"
echo "attributes: ${BASH_REMATCH[11]}"
echo "actions:    ${BASH_REMATCH[13]}"

Note that starting with the optional parameters we need to skip a group each time, because the even numbered groups from 4 on contain the parameter name as well as the value (if the parameter is present).




回答2:


You can emulate non-matching groups in bash using a little bit of regexp magic:

              _2__    _4__   _5__
[[ "fu@k" =~ ((.+)@|)((.+)/|)(.+) ]];
echo "${BASH_REMATCH[2]:--} ${BASH_REMATCH[4]:--} ${BASH_REMATCH[5]:--}"
# Output: fu - k

Characters @ and / are parts of string we parse. Regexp pipe | is used for either left or right (empty) part matching.

For curious, ${VAR:-<default value>} is variable expansion with default value in case $VAR is empty.



来源:https://stackoverflow.com/questions/8718851/matching-optional-parameters-with-non-capturing-groups-in-bash-regular-expressio

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!