问题
Breaking up the below code to understand my regex and gsub
understanding:
str = "abc/def/ghi.rb"
str = str.gsub(/^.*\//, '')
#str = ghi.rb
^
: beginning of the string
\/
: escape character for /
^.*\/
: everything from beginning to the last occurrence of /
in the string
Is my understanding of the expression right?
How does .*
work exactly?
回答1:
Your general understanding is correct. The entire regex will match abc/def/
and String#gsub
will replace it with empty string.
However, note that String#gsub doesn't change the string in place. This means that str
will contain the original value("abc/def/ghi.rb"
) after the substitution. To change it in place, you can use String#gsub!.
As to how
.*
works - the algorithm the regex engine uses is called backtracking. Since .*
is greedy (will try to match as many characters as possible), you can think that something like this will happen:
Step 1:
.*
matches the entire stringabc/def/ghi.rb
. Afterwards\/
tries to match a forward slash, but fails (nothing is left to match)..*
has to backtrack.
Step 2:.*
matches the entire string except the last character -abc/def/ghi.r
. Afterwards\/
tries to match a forward slash, but fails (/ != b
)..*
has to backtrack.
Step 3:.*
matches the entire string except the last two characters -abc/def/ghi.
. Afterwards\/
tries to match a forward slash, but fails (/ != r
)..*
has to backtrack.
...
Step n:.*
matchesabc/def
. Afterwards\/
tries to match a forward slash and succeeds. The matching ends here.
回答2:
No, not quite.
^
: beginning of a line\/
: escaped slash (escape character is\
alone)^.*\/
: everything from beginning of a line to the last occurrence of/
in the string
.*
depends on the mode of the regex. In singleline mode (i.e., without m
option), it means the longest possible sequence of zero or more non-newline characters. In multiline mode (i.e., with m
option), it means the longest possible sequence of zero or more characters.
回答3:
Your understanding is correct, but you should also note that the last statement is true because:
Repetition is greedy by default: as many occurrences as possible
are matched while still allowing the overall match to succeed.
Quoted from the Regexp documentation.
回答4:
Yes. In short, it matches any number of any characters (.*
) ending with a literal /
(\/
).
gsub
replaces the match with the second argument (empty string ''
).
回答5:
Nothing wrong with your regex, but File.basename(str) might be more appropriate.
To expound on what @Stefen said: It really looks like you're dealing with a file path, and that makes your question an XY problem where you're asking about Y when you should ask about X: Rather than how to use and understand a regex, the question should be what tool is used to manage paths.
Instead of rolling your own code, use code already written that comes with the language:
str = "abc/def/ghi.rb"
File.basename(str) # => "ghi.rb"
File.dirname(str) # => "abc/def"
File.split(str) # => ["abc/def", "ghi.rb"]
The reason you want to take advantage of File's built-in code is it takes into account the difference between directory delimiters in *nix-style OSes and Windows. At start-up, Ruby checks the OS and sets the File::SEPARATOR
constant to what the OS needs:
File::SEPARATOR # => "/"
If your code moves from one system to another it will continue working if you use the built-in methods, whereas using a regex will immediately break because the delimiter will be wrong.
来源:https://stackoverflow.com/questions/34395026/how-to-understand-gsub-or-the-regex