Fuzzy string match in PowerShell

拜拜、爱过 提交于 2020-07-07 14:30:20

问题


How can I do fuzzy string matching within PowerShell scripts?

I have different sets of names of people scraped from different sources and have them stored in an array. When I add a new name, I like to compare the name with existing name and if they fuzzily matches, I like to consider them to be the same. For example, with data set of:

@("George Herbert Walker Bush",
  "Barbara Pierce Bush",
  "George Walker Bush",
  "John Ellis (Jeb) Bush"  )

I like to see following outputs from the given input:

"Barbara Bush" -> @("Barbara Pierce Bush")
"George Takei" -> @("")
"George Bush"  -> @("George Herbert Walker Bush","George Walker Bush")

At minimum, I like to see matching to be case insensitive, and also flexible enough to handle some level of misspelling if possible.

As far as I can tell, standard libraries does not provide such functionalities. Is there an easy-to-install module which can accomplish this?


回答1:


Searching at PowerShell Gallery with term "fuzzy", I found this package: Communary.PASM.

It can be simply installed with:

PS> Install-Package Communary.PASM                                                                                                     

The project is found here in GitHub. I simply looked at this examples file for reference.

Here is my examples:

$colors = @("Red", "Orange", "Yellow", "Green", "Blue", "Violet", "Sky Blue" )

PS> $colors | Select-FuzzyString Red

Score Result
----- ------   
  300 Red

This is a perfect match, with 100 max score for each characters.

PS> $colors | Select-FuzzyString gren

Score Result
----- ------
  295 Green 

It tolerate a little missing characters.

PS> $colors | Select-FuzzyString blue

Score Result  
----- ------     
  400 Blue       
  376 Sky Blue

Multiple values can be returned with different scores.

PS> $colors | Select-FuzzyString vioret

# No output

But it does not tolerate a little bit of misspell. Then I also tried Select-ApproximateString:

PS> $colors | Select-ApproximateString vioret
Violet

This has different API that it only returns a single match or nothing. Also it may not return anything when Select-FuzzyString does.

This was tested with PowerShell Core v6.0.0-beta.9 on MacOS and Communary.PASM 1.0.43.



来源:https://stackoverflow.com/questions/47256003/fuzzy-string-match-in-powershell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!