Regex program to search a string with spaces and back slashes performance issue

喜你入骨 提交于 2020-08-05 04:54:19

问题


These are the lines of text I have:

Region\ name=Provence\ Alpes\ Cote\ d'Azur shops=350,City=Nice 12345
Region\ name=Provence\ Alpes\ Cote\ d'Azur,City=Nice shopsabcdabcdabcdasssss=350 13456
City=Nice,Region\ name=Provence\ Alpes\ Cote\ d'Azur shopsabcdabcdabcdasssss=350 23456

Input: Region\ name
Output: Provence\ Alpes\ Cote\ d'Azur

Input: City
Output: Nice

Below solution provides the result:

val data =List("Region\\ name=Provence\\ Alpes\\ Cote\\ d'Azur shops=350,City=Nice"
                ,"Region\\ name=Provence\\ Alpes\\ Cote\\ d'Azur,City=Nice shopsabcdabcdabcdasssss=350"
                ,"City=Nice,Region\\ name=Provence\\ Alpes\\ Cote\\ d'Azur shopsabcdabcdabcdasssss=350"
                ,"City=Nice,Region\\ name =unknown shops=350")
               //With that, let's extract all the values where target is the key.
val target  = """Region\\ name"""
val pattern =raw"$target\s*=((?:[\w'\\ -]+)+)(?:[ ,]+\w+=|,|$$)".r.unanchored
val output  = data.collect{ case pattern(m) => m }

But it is taking more time or hangs to extract the result by using .r.unanchored when there is a long string like shopsabcdabcdabcdasssss or shopsabcdabcdabcdasssssssssssssssssssssss.

Can it be replaced with better code? It hs been resolved and thanks for contributing answer

regex101.com/r/nSYxfj/6 ----------->will it work for extracting integer value.Or I have to modify something


回答1:


The ((?:[\w'\\ -]+)+) pattern part causes catastrophic backtracking.

You need to use

Region\\ name\s*=([\w'\\\s-]+)(?:[\s,]+\w+=|,|$)

See the regex demo.

In Scala, define the pattern like this:

val pattern =raw"$target\s*=([\w'\\\s-]+)(?:[\s,]+\w+=|,|$$)".r.unanchored


来源:https://stackoverflow.com/questions/63079643/regex-program-to-search-a-string-with-spaces-and-back-slashes-performance-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!