问题
Breaking up the below code to understand my regex and gsub understanding:
str = "abc/def/ghi.rb"
str = str.gsub(/^.*\//, '')
#str = ghi.rb
^ : beginning of the string
\/ : escape character for /
^.*\/ : everything from beginning to the last occurrence of / in the string
Is my understanding of the expression right?
How does .* work exactly?
回答1:
Your general understanding is correct. The entire regex will match abc/def/ and String#gsub will replace it with empty string.
However, note that String#gsub doesn't change the string in place. This means that str will contain the original value("abc/def/ghi.rb") after the substitution. To change it in place, you can use String#gsub!.
As to how
.* works - the algorithm the regex engine uses is called backtracking. Since .* is greedy (will try to match as many characters as possible), you can think that something like this will happen:
Step 1:
.*matches the entire stringabc/def/ghi.rb. Afterwards\/tries to match a forward slash, but fails (nothing is left to match)..*has to backtrack.
Step 2:.*matches the entire string except the last character -abc/def/ghi.r. Afterwards\/tries to match a forward slash, but fails (/ != b)..*has to backtrack.
Step 3:.*matches the entire string except the last two characters -abc/def/ghi.. Afterwards\/tries to match a forward slash, but fails (/ != r)..*has to backtrack.
...
Step n:.*matchesabc/def. Afterwards\/tries to match a forward slash and succeeds. The matching ends here.
回答2:
No, not quite.
^: beginning of a line\/: escaped slash (escape character is\alone)^.*\/: everything from beginning of a line to the last occurrence of/in the string
.* depends on the mode of the regex. In singleline mode (i.e., without m option), it means the longest possible sequence of zero or more non-newline characters. In multiline mode (i.e., with m option), it means the longest possible sequence of zero or more characters.
回答3:
Your understanding is correct, but you should also note that the last statement is true because:
Repetition is greedy by default: as many occurrences as possible
are matched while still allowing the overall match to succeed.
Quoted from the Regexp documentation.
回答4:
Yes. In short, it matches any number of any characters (.*) ending with a literal / (\/).
gsub replaces the match with the second argument (empty string '').
回答5:
Nothing wrong with your regex, but File.basename(str) might be more appropriate.
To expound on what @Stefen said: It really looks like you're dealing with a file path, and that makes your question an XY problem where you're asking about Y when you should ask about X: Rather than how to use and understand a regex, the question should be what tool is used to manage paths.
Instead of rolling your own code, use code already written that comes with the language:
str = "abc/def/ghi.rb"
File.basename(str) # => "ghi.rb"
File.dirname(str) # => "abc/def"
File.split(str) # => ["abc/def", "ghi.rb"]
The reason you want to take advantage of File's built-in code is it takes into account the difference between directory delimiters in *nix-style OSes and Windows. At start-up, Ruby checks the OS and sets the File::SEPARATOR constant to what the OS needs:
File::SEPARATOR # => "/"
If your code moves from one system to another it will continue working if you use the built-in methods, whereas using a regex will immediately break because the delimiter will be wrong.
来源:https://stackoverflow.com/questions/34395026/how-to-understand-gsub-or-the-regex