Replacing any content inbetween second and third underscore

不羁岁月 提交于 2021-01-27 20:55:48

问题


I have a PowerShell Scriptline that replaces(deletes) characters between the second and third underscore with an "_":

get-childitem *.pdf | rename-item -newname { $_.name -replace '_\p{L}+, \p{L}+_', "_"}

Examples:

12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf
12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf
12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf

This _\p{L}+, \p{L}+_ regex only works for the first example. To replace everything inbetween I have used _(?:[^_]*)_([^_]*)_ (according to regex101 this should almost work) but the output is:

12345_09_MoreText.pdf

The desired output would be:

 12345_00001_09_2018_Text_MoreText.pdf
 12345_00002_09_2018_Text_MoreText.pdf
 12345_00003_09_2018_Text_MoreText.pdf

How do I correctly replace the second and third underscore and everything inbetween with an "_"?


回答1:


You may use

-replace '^((?:[^_]*_){2})[^_]+_', '$1'

See the regex demo

Details

  • ^ - start of the line
  • ((?:[^_]*_){2}) - Group 1 (the value will be referenced to with $1 from the replacement pattern): two repetitions of
    • [^_]* - 0+ chars other than an underscore
    • _ - an underscore
  • [^_]+ - 1 or more chars other than _
    • _ - an underscore



回答2:


If you don't want to use regex -

$files = get-childitem *.pdf        #get all pdf files
$ModifiedFiles, $New = @()  #declaring two arrays
foreach($file in $files)
{
    $ModifiedFiles = $file.split("_")
    $ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] }     #ommitting anything between second and third underscore
    $New = "$ModifiedFiles" -replace (" ", "_")
    Rename-Item -Path $file.FullName -NewName $New
}

Sample Data -

$files = "12345_00001_LastName, FirstName_09_2018_Text_MoreText.pdf", "12345_00002_LastName, FirstName-SecondName_09_2018_Text_MoreText.pdf", "12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf"
$ModifiedFiles, $New = @()  #declaring two arrays
foreach($file in $files)
{
    $ModifiedFiles = $file.split("_")
    $ModifiedFiles = $ModifiedFiles | Where-Object { $_ -ne $ModifiedFiles[2] }     #ommitting anything between second and third underscore
    $New = "$ModifiedFiles" -replace (" ", "_")
}



回答3:


To offer an alternative solution that avoids a complex regex: The following is based on the -split and -join operators and shows PowerShell's flexibility with respect to array slicing:

Get-ChildItem *.pdf | Rename-Item { ($_.Name -split '_')[0..1 + 3..6] -join '_' } -WhatIf
  • $_.Name -split '_' splits the filename by _ into an array of tokens (substrings).
  • Array slice [0..1 + 3..6] combines two range expressions (..) to essentially remove the token with index 2 from the array.
  • -join '_' reassembles the modified array into a _-separated string, yielding the desired result.

Note: 6, the upper array bound, is hard-coded above, which is suboptimal, but sufficient with input as predictable as in this case.

As of Windows PowerShell v5.1 / PowerShell Core 6.1.0, in order to determine the upper bound dynamically, you require the help of an auxiliary variable, which is clumsy:

Get-ChildItem *.pdf |
  Rename-Item { ($arr = $_.Name -split '_')[0..1 + 3..($arr.Count-1)] -join '_' } -WhatIf

Wouldn't it be nice if we could write [0..1 + 3..] instead? This and other improvements to PowerShell's slicing syntax are the subject of this feature suggestion on GitHub.




回答4:


here's one other way ... using string methods.

'12345_00003_LastName, FirstName SecondName_09_2018_Text_MoreText.pdf'.
    Split('_').
    Where({
        $_ -notmatch ','
        }) -join '_'

result = 12345_00003_09_2018_Text_MoreText.pdf

that does the following ...

  • split on the underscores
  • toss out any item that has a comma in it
  • join the remaining items back into a string with underscores

i suspect that the pure regex solution will be faster, but you may want to use this simply to have something that is easier to understand when you next need to modify it. [grin]



来源:https://stackoverflow.com/questions/52813671/replacing-any-content-inbetween-second-and-third-underscore

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!