How do I fix this multiline regular expression in Ruby?

两盒软妹~` 提交于 2019-12-05 23:39:32

问题


I have a regular expression in Ruby that isn't working properly in multiline mode.

I'm trying to convert Markdown text into the Textile-eque markup used in Redmine. The problem is in my regular expression for converting code blocks. It should find any lines leading with 4 spaces or a tab, then wrap them in pre tags.

markdownText = '# header

some text that precedes code

    var foo = 9;
    var fn = function() {}

    fn();

some post text'

puts markdownText.gsub!(/(^(?:\s{4}|\t).*?$)+/m,"<pre>\n\\1\n</pre>")

Intended result:

# header

some text that precedes code

<pre>
    var foo = 9;
    var fn = function() {}

    fn();
</pre>

some post text

The problem is that the closing pre tag is printed at the end of the document instead of after "fn();". I tried some variations of the following expression but it doesn't match:

gsub!(/(^(?:\s{4}|\t).*?$)+^(\S)/m, "<pre>\n\\1\n</pre>\\2")

How do I get the regular expression to match just the indented code block? You can test this regular expression on Rubular here.


回答1:


First, note that 'm' multi-line mode in Ruby is equivalent to 's' single-line mode of other languages. In other words; 'm' mode in Ruby means: "dot matches all".

This regex will do a pretty good job of matching a markdown-like code section:

re = / # Match a MARKDOWN CODE section.
    (\r?\n)              # $1: CODE must be preceded by blank line
    (                    # $2: CODE contents
      (?:                # Group for multiple lines of code.
        (?:\r?\n)+       # Each line preceded by a newline,
        (?:[ ]{4}|\t).*  # and begins with four spaces or tab.
      )+                 # One or more CODE lines
      \r?\n              # CODE folowed by blank line.
    )                    # End $2: CODE contents
    (?=\r?\n)            # CODE folowed by blank line.
    /x
result = subject.gsub(re, '\1<pre>\2</pre>')

This requires a blank line before and after the code section and allows blank lines within the code section itself. It allows for either \r\n or \n line terminations. Note that this does not strip the leading 4 spaces (or tab) before each line. Doing that will require more code complexity. (I am not a ruby guy so can't help out with that.)

I would recommend looking at the markdown source itself to see how its really being done.




回答2:


/^(\s{4}|\t)+.+\;\n$/m

works a little better, still picks up a newline that we don't want. here it is on rubular.




回答3:


This is working for me with your sample input.

markdownText.gsub(/\n?((\s{4}.+)+)/, "\n<pre>#{$1}\n</pre>")



回答4:


Here's another one that captures all the indented lines in a single block

((?:^(?: {4}|\t)[^\n]*$\n?)+)


来源:https://stackoverflow.com/questions/5719353/how-do-i-fix-this-multiline-regular-expression-in-ruby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!