Regex - Matching text AFTER certain characters

后端 未结 4 1466
梦如初夏
梦如初夏 2020-12-19 01:15

I want to scrape data from some text and dump it into an array. Consider the following text as example data:

| Example Data
| Title: This is a sample title
|         


        
相关标签:
4条回答
  • 2020-12-19 01:36

    In Ruby, as in PCRE and Boost, you may make use of the \K match reset operator:

    \K keeps the text matched so far out of the overall regex match. h\Kd matches only the second d in adhd.

    So, you may use

    /:[[:blank:]]*\K.+/     # To only match horizontal whitespaces with `[[:blank:]]`
    /:\s*\K.+/              # To match any whitespace with `\s`
    

    Seee the Rubular demo #1 and the Rubular demo #2 and

    Details

    • : - a colon
    • [[:blank:]]* - 0 or more horizontal whitespace chars
    • \K - match reset operator discarding the text matched so far from the overall match memory buffer
    • .+ - matches and consumes any 1 or more chars other than line break chars (use /m modifier to match any chars including line break chars).
    0 讨论(0)
  • 2020-12-19 01:39

    You could change it to:

    /: (.+)/
    

    and grab the contents of group 1. A lookbehind works too, though, and does just what you're asking:

    /(?<=: ).+/
    
    0 讨论(0)
  • 2020-12-19 01:41

    In addition to @minitech's answer, you can also make a 3rd variation:

    /(?<=: ?)(.+)/
    

    The difference here being, you create/grab the group using a look-behind.

    If you still prefer the look-ahead rather than look-behind concept. . .

    /(?=: ?(.+))/
    

    This will place a grouping around your existing regex where it will catch it within a group.

    And yes, the outside parenthesis in your code will make a match. Compare that to the latter example I gave where the entire look-ahead is 'grouped' rather than needlessly using a /( ... )/ without the /(?= ... )/, since the first result in most regular expression engines return the entire matched string.

    0 讨论(0)
  • 2020-12-19 01:42

    I know you are asking for regex but I just saw the regex solution and found that it is rather hard to read for those unfamiliar with regex.

    I'm also using Ruby and I decided to do it with:

    line_as_string.split(": ")[-1]
    

    This does what you require and IMHO it's far more readable. For a very long string it might be inefficient. But not for this purpose.

    0 讨论(0)
提交回复
热议问题