问题
I'm curious if there's a way to extract email:password
from a big list.
It is listed in the text in that format but with a few other unuseable parts in front (such as name, last name).
The format is mostly:
xx:Mxx:Support:xx:support@xx.com:x19000
But sometimes can be like this as well:
xxxx::gexrge@xxnt.com:111111
I have tried with EmEditor and if I search for
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]).*$
it does find it. I have then to replace with \1
- however this takes literally ages and finally crashes (the file is 17GB).
Knowing that powershell could do this too, I'm looking for the right command.
回答1:
The switch statement allows combining efficient line-by-line processing of files (via the -File
parameter), optionally combined with regex-matching (via the -Regex
option):
& {
switch -regex -file in.txt {
'(?<=:)[^@:]+@[^:]+:.*' { $Matches[0] }
}
} | Set-Content -Encoding utf8 out.txt
Adjust the -Encoding
argument as needed; note that in Windows PowerShell utf8
creates a file with BOM, whereas PowerShell [Core] v6+ creates one wihout BOM. By default, Set-Encoding
uses the system's active ANSI code page in Windows PowerShell, whereas PowerShell [Core] v6+ consistently defaults to BOM-less UTF-8, across all cmdlets.
The above extracts the email-password pairs extracted from file in.txt
as individual lines to file out.txt
.
Note: Even though the above performs line-by-line processing, an out-of-memory exception can apparently still occur in Set-Content
with very large input files; the .NET-based solution in the next section should fix that, while also significantly speeding up the operation.
Performance caveat: While the above is memory-efficient, it will be slow with large files; to address that, you must make direct use of the .NET framework, via a System.IO.StreamWriter instance:
# Create the output file.
# Note:
# * Be sure to use a *full* path, because .NET's current dir. usually differs
# from PowerShell's
# * UTF-8 *without a BOM* is used as the character encoding by default,
# but you may pass a [System.Text.Encoding] instance as needed.
$sw = [System.IO.StreamWriter]::new("$PWD/out.txt")
switch -regex -file in.txt {
'(?<=:)[^@:]+@[^:]+:.*' { $sw.WriteLine($Matches[0]) }
}
$sw.Close()
回答2:
The PowerShell function required is a split and array indexing Here is the code you need
$String = "xx:Mxx:Support:xx:support@xx.com:x19000 xx:Jeremy:xxx:1977-07-22:xxx@gmail.com:bar_baz xxx:Yuxxya:xx:1975-03-28:liz@gdddaxxt.com:loddta999"
$first_seperator = ":"
$sec_seperator = " "
Function seperate ($string,$seperator){
$string.Split($seperator)
}
$mails = seperate $String $sec_seperator
foreach ($String in $mails){$mail = seperate $String $first_seperator
Write-host "The compelete mail info is " $String -f Green
Write-host "The mail is:- " $mail[4]
Write-Host "The password is :-" $mail[5]
}
and The output is as follows:
The compelete mail ifo is xx:Mxx:Support:xx:support@xx.com:x19000
The mail is:- support@xx.com
The password is :- x19000
The compelete mail ifo is xx:Jeremy:xxx:1977-07-22:xxx@gmail.com:bar_baz
The mail is:- xxx@gmail.com
The password is :- bar_baz
The compelete mail ifo is xxx:Yuxxya:xx:1975-03-28:liz@gdddaxxt.com:loddta999
The mail is:- liz@gdddaxxt.com
The password is :- loddta999
I hope that answers your question feel free to ask for any illustration and mark this as answer if it help
来源:https://stackoverflow.com/questions/65448263/extract-emailpassword