问题
These are the lines of text I have:
Region\ name=Provence\ Alpes\ Cote\ d'Azur shops=350,City=Nice 12345
Region\ name=Provence\ Alpes\ Cote\ d'Azur,City=Nice shopsabcdabcdabcdasssss=350 13456
City=Nice,Region\ name=Provence\ Alpes\ Cote\ d'Azur shopsabcdabcdabcdasssss=350 23456
Input: Region\ name
Output: Provence\ Alpes\ Cote\ d'Azur
Input: City
Output: Nice
Below solution provides the result:
val data =List("Region\\ name=Provence\\ Alpes\\ Cote\\ d'Azur shops=350,City=Nice"
,"Region\\ name=Provence\\ Alpes\\ Cote\\ d'Azur,City=Nice shopsabcdabcdabcdasssss=350"
,"City=Nice,Region\\ name=Provence\\ Alpes\\ Cote\\ d'Azur shopsabcdabcdabcdasssss=350"
,"City=Nice,Region\\ name =unknown shops=350")
//With that, let's extract all the values where target is the key.
val target = """Region\\ name"""
val pattern =raw"$target\s*=((?:[\w'\\ -]+)+)(?:[ ,]+\w+=|,|$$)".r.unanchored
val output = data.collect{ case pattern(m) => m }
But it is taking more time or hangs to extract the result by using .r.unanchored when there is a long string like shopsabcdabcdabcdasssss or shopsabcdabcdabcdasssssssssssssssssssssss.
Can it be replaced with better code? It hs been resolved and thanks for contributing answer
regex101.com/r/nSYxfj/6 ----------->will it work for extracting integer value.Or I have to modify something
回答1:
The ((?:[\w'\\ -]+)+) pattern part causes catastrophic backtracking.
You need to use
Region\\ name\s*=([\w'\\\s-]+)(?:[\s,]+\w+=|,|$)
See the regex demo.
In Scala, define the pattern like this:
val pattern =raw"$target\s*=([\w'\\\s-]+)(?:[\s,]+\w+=|,|$$)".r.unanchored
来源:https://stackoverflow.com/questions/63079643/regex-program-to-search-a-string-with-spaces-and-back-slashes-performance-issue