How to understand gsub(/^.*\//, '') or the regex

问题

Breaking up the below code to understand my regex and gsub understanding:

str = "abc/def/ghi.rb"
str = str.gsub(/^.*\//, '')
#str = ghi.rb

^ : beginning of the string

\/ : escape character for /

^.*\/ : everything from beginning to the last occurrence of / in the string

Is my understanding of the expression right?

How does .* work exactly?

回答1:

Your general understanding is correct. The entire regex will match abc/def/ and String#gsub will replace it with empty string.

However, note that String#gsub doesn't change the string in place. This means that str will contain the original value("abc/def/ghi.rb") after the substitution. To change it in place, you can use String#gsub!.

As to how .* works - the algorithm the regex engine uses is called backtracking. Since .* is greedy (will try to match as many characters as possible), you can think that something like this will happen:

Step 1: .* matches the entire string abc/def/ghi.rb. Afterwards \/ tries to match a forward slash, but fails (nothing is left to match). .* has to backtrack.
Step 2: .* matches the entire string except the last character - abc/def/ghi.r. Afterwards \/ tries to match a forward slash, but fails (/ != b). .* has to backtrack.
Step 3: .* matches the entire string except the last two characters - abc/def/ghi.. Afterwards \/ tries to match a forward slash, but fails (/ != r). .* has to backtrack.
...
Step n: .* matches abc/def. Afterwards \/ tries to match a forward slash and succeeds. The matching ends here.

回答2:

No, not quite.

^: beginning of a line
\/: escaped slash (escape character is \ alone)
^.*\/ : everything from beginning of a line to the last occurrence of / in the string

.* depends on the mode of the regex. In singleline mode (i.e., without m option), it means the longest possible sequence of zero or more non-newline characters. In multiline mode (i.e., with m option), it means the longest possible sequence of zero or more characters.

回答3:

Your understanding is correct, but you should also note that the last statement is true because:

Repetition is greedy by default: as many occurrences as possible 
are matched while still allowing the overall match to succeed.

Quoted from the Regexp documentation.

回答4:

Yes. In short, it matches any number of any characters (.*) ending with a literal / (\/).

gsub replaces the match with the second argument (empty string '').

回答5:

Nothing wrong with your regex, but File.basename(str) might be more appropriate.

To expound on what @Stefen said: It really looks like you're dealing with a file path, and that makes your question an XY problem where you're asking about Y when you should ask about X: Rather than how to use and understand a regex, the question should be what tool is used to manage paths.

Instead of rolling your own code, use code already written that comes with the language:

str = "abc/def/ghi.rb"
File.basename(str) # => "ghi.rb"
File.dirname(str) # => "abc/def"
File.split(str) # => ["abc/def", "ghi.rb"]

The reason you want to take advantage of File's built-in code is it takes into account the difference between directory delimiters in *nix-style OSes and Windows. At start-up, Ruby checks the OS and sets the File::SEPARATOR constant to what the OS needs:

File::SEPARATOR # => "/"

If your code moves from one system to another it will continue working if you use the built-in methods, whereas using a regex will immediately break because the delimiter will be wrong.

来源：https://stackoverflow.com/questions/34395026/how-to-understand-gsub-or-the-regex

标签

ruby

regex

gsub