Extract string from text file via Powershell

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-04 18:51:28

问题


I have been trying to extract certain values from multiple lines inside a .txt file with PowerShell.

Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"

This is what I want :

server01
server02
server03 test

I have code so far :

$Regex = [Regex]::new("(?<=Equal)(.*)(?=OR")           
$Match = $Regex.Match($String)

回答1:


You may use

[regex]::matches($String, '(?<=Equal\s*")[^"]+')

See the regex demo.

See more ways to extract multiple matches here. However, you main problem is the regex pattern. The (?<=Equal\s*")[^"]+ pattern matches:

  • (?<=Equal\s*") - a location preceded with Equal and 0+ whitespaces and then a "
  • [^"]+ - consumes 1+ chars other than double quotation mark.

Demo:

$String = "Host`nClass`nINCLUDE vmware:/?filter=Displayname Equal ""server01"" OR Displayname Equal ""server02"" OR Displayname Equal ""server03 test"""
[regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value}

Output:

server01
server02
server03 test

Here is a full snippet reading the file in, getting all matches and saving to file:

$newfile = 'file.txt'
$file = 'newtext.txt'
$regex = '(?<=Equal\s*")[^"]+'
Get-Content $file | 
     Select-String $regex -AllMatches | 
     Select-Object -Expand Matches | 
     ForEach-Object { $_.Value } |
     Set-Content $newfile



回答2:


Another option (PSv3+), combining [regex]::Matches() with the -replace operator for a concise solution:

$str = @'
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
'@ 

[regex]::Matches($str, '".*?"').Value -replace '"'

Regex ".*?" matches all "..."-enclosed tokens; .Value extracts them, and -replace '"' strips the " chars.

It may be not be obvious, but this happens to be the fastest solution among the answers here, based on my tests - see bottom.


As an aside: The above would be even more PowerShell-idiomatic if the -match operator - which only looks for a (one) match - had a variant named, say, -matchall, so that one could write:

# WISHFUL THINKING (as of PowerShell Core 6.2)
$str -matchall '".*?"' -replace '"'

See this feature suggestion on GitHub.


Optional reading: performance comparison

Pragmatically speaking, all solutions here are helpful and may be fast enough, but there may be situations where performance must be optimized.

Generally, using Select-String (and the pipeline in general) comes with a performance penalty - while offering elegance and memory-efficient streaming processing.

Also, repeated invocation of script blocks (e.g., { $_.Value }) tends to be slow - especially in a pipeline with ForEach-Object or Where-Object, but also - to a lesser degree - with the .ForEach() and .Where() collection methods (PSv4+).

In the realm of regexes, you pay a performance penalty for variable-length look-behind expressions (e.g. (?<=EQUAL\s*")) and the use of capture groups (e.g., (.*?)).

Here is a performance comparison using the Time-Command function, averaging 1000 runs:

Time-Command -Count 1e3 { [regex]::Matches($str, '".*?"').Value -replace '"' },
   { [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} },
   { [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value },
   { $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} } |
     Format-Table Factor, Command

Sample timings from my MacBook Pro; the exact times aren't important (you can remove the Format-Table call to see them), but the relative performance is reflected in the Factor column, from fastest to slowest.

Factor Command
------ -------
1.00   [regex]::Matches($str, '".*?"').Value -replace '"' # this answer
2.85   [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value # AdminOfThings'
6.07   [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} # Wiktor's
8.35   $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} # LotPings'



回答3:


You can modify your regex to use a capture group, which is indicated by the parentheses. The backslashes just escape the quotes. This allows you to just capture what you are looking for and then filter it further. The capture group here is automatically named 1 since I didn't provide a name. Capture group 0 is the entire match including quotes. I switched to the Matches method because that encompasses all matches for the string whereas Match only captures the first match.

$regex = [regex]'\"(.*?)\"'    
$regex.matches($string).groups.where{$_.name -eq 1}.value

If you want to export the results, you can do the following:

$regex = [regex]'\"(.*?)\"'    
$regex.matches($string).groups.where{$_.name -eq 1}.value | sc "c:\temp\export.txt"



回答4:


An alterative reading the file directly with Select-String using Wiktor's good RegEx:

Select-String -Path .\file.txt -Pattern '(?<=Equal\s*")[^"]+' -AllMatches|
    ForEach-Object{$_.Matches.Value} | Set-Content NewFile.txt

Sample output:

> Get-Content .\NewFile.txt
server01
server02
server03 test


来源:https://stackoverflow.com/questions/54606709/extract-string-from-text-file-via-powershell

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!