How to parse a command line with regular expressions?

后端 未结 13 680
离开以前
离开以前 2020-12-03 16:02

I want to split a command line like string in single string parameters. How look the regular expression for it. The problem are that the parameters can be quoted. For exampl

相关标签:
13条回答
  • 2020-12-03 16:20

    You should not use regular expressions for this. Write a parser instead, or use one provided by your language.

    I don't see why I get downvoted for this. This is how it could be done in Python:

    >>> import shlex
    >>> shlex.split('"param 1" param2 "param 3"')
    ['param 1', 'param2', 'param 3']
    >>> shlex.split('"param 1" param2 "param 3')
    Traceback (most recent call last):
        [...]
    ValueError: No closing quotation
    >>> shlex.split('"param 1" param2 "param 3\\""')
    ['param 1', 'param2', 'param 3"']
    

    Now tell me that wrecking your brain about how a regex will solve this problem is ever worth the hassle.

    0 讨论(0)
  • 2020-12-03 16:20
    ("[^"]+"|[^\s"]+)
    

    what i use C++

    #include <iostream>
    #include <iterator>
    #include <string>
    #include <regex>
    
    void foo()
    {
        std::string strArg = " \"par   1\"  par2 par3 \"par 4\""; 
    
        std::regex word_regex( "(\"[^\"]+\"|[^\\s\"]+)" );
        auto words_begin = 
            std::sregex_iterator(strArg.begin(), strArg.end(), word_regex);
        auto words_end = std::sregex_iterator();
        for (std::sregex_iterator i = words_begin; i != words_end; ++i)
        {
            std::smatch match = *i;
            std::string match_str = match.str();
            std::cout << match_str << '\n';
        }
    }
    

    Output:

    "par   1"
    par2
    par3
    "par 4"
    
    0 讨论(0)
  • 2020-12-03 16:31

    Something like:

    "(?:(?<=")([^"]+)"\s*)|\s*([^"\s]+)
    

    or a simpler one:

    "([^"]+)"|\s*([^"\s]+)
    

    (just for the sake of finding a regexp ;) )

    Apply it several time, and the group n°1 will give you the parameter, whether it is surrounded by double quotes or not.

    0 讨论(0)
  • 2020-12-03 16:35

    there's a python answer thus we shall have a ruby answer as well :)

    require 'shellwords'
    Shellwords.shellsplit '"param 1" param2 "param 3"'
    #=> ["param 1", "param2", "param 3"] or :
    '"param 1" param2 "param 3"'.shellsplit
    
    0 讨论(0)
  • 2020-12-03 16:35

    If you are looking to parse the command and the parameters I use the following (with ^$ matching at line breaks aka multiline):

    (?<cmd>^"[^"]*"|\S*) *(?<prm>.*)?
    

    In case you want to use it in your C# code, here it is properly escaped:

    try {
        Regex RegexObj = new Regex("(?<cmd>^\\\"[^\\\"]*\\\"|\\S*) *(?<prm>.*)?");
    
    } catch (ArgumentException ex) {
        // Syntax error in the regular expression
    }
    

    It will parse the following and know what is the command versus the parameters:

    "c:\program files\myapp\app.exe" p1 p2 "p3 with space"
    app.exe p1 p2 "p3 with space"
    app.exe
    
    0 讨论(0)
  • 2020-12-03 16:36

    Regex: /[\/-]?((\w+)(?:[=:]("[^"]+"|[^\s"]+))?)(?:\s+|$)/g

    Sample: /P1="Long value" /P2=3 /P3=short PwithoutSwitch1=any PwithoutSwitch2

    Such regex can parses the parameters list that built by rules:

    • Parameters are separates by spaces (one or more).
    • Parameter can contains switch symbol (/ or -).
    • Parameter consists from name and value that divided by symbol = or :.
    • Name can be set of alphanumerics and underscores.
    • Value can absent.
    • If value exists it can be the set of any symbols, but if it has the space then value should be quoted.

    This regex has three groups:

    • the first group contains whole parameters without switch symbol,
    • the second group contains name only,
    • the third group contains value (if it exists) only.

    For sample above:

    1. Whole match: /P1="Long value"
      • Group#1: P1="Long value",
      • Group#2: P1,
      • Group#3: "Long value".
    2. Whole match: /P2=3
      • Group#1: P2=3,
      • Group#2: P2,
      • Group#3: 3.
    3. Whole match: /P3=short
      • Group#1: P3=short,
      • Group#2: P3,
      • Group#3: short.
    4. Whole match: PwithoutSwitch1=any
      • Group#1: PwithoutSwitch1=any,
      • Group#2: PwithoutSwitch1,
      • Group#3: any.
    5. Whole match: PwithoutSwitch2
      • Group#1: PwithoutSwitch2,
      • Group#2: PwithoutSwitch2,
      • Group#3: absent.
    0 讨论(0)
提交回复
热议问题