Extract email:password

不羁岁月 提交于 2021-02-11 12:03:11

问题


I'm curious if there's a way to extract email:password from a big list. It is listed in the text in that format but with a few other unuseable parts in front (such as name, last name).

The format is mostly:

xx:Mxx:Support:xx:support@xx.com:x19000

But sometimes can be like this as well:

xxxx::gexrge@xxnt.com:111111

I have tried with EmEditor and if I search for

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]).*$

it does find it. I have then to replace with \1 - however this takes literally ages and finally crashes (the file is 17GB).

Knowing that powershell could do this too, I'm looking for the right command.


回答1:


The switch statement allows combining efficient line-by-line processing of files (via the -File parameter), optionally combined with regex-matching (via the -Regex option):

& { 
  switch -regex -file in.txt { 
   '(?<=:)[^@:]+@[^:]+:.*' { $Matches[0] } 
  }
} | Set-Content -Encoding utf8 out.txt

Adjust the -Encoding argument as needed; note that in Windows PowerShell utf8 creates a file with BOM, whereas PowerShell [Core] v6+ creates one wihout BOM. By default, Set-Encoding uses the system's active ANSI code page in Windows PowerShell, whereas PowerShell [Core] v6+ consistently defaults to BOM-less UTF-8, across all cmdlets.

The above extracts the email-password pairs extracted from file in.txt as individual lines to file out.txt.

Note: Even though the above performs line-by-line processing, an out-of-memory exception can apparently still occur in Set-Content with very large input files; the .NET-based solution in the next section should fix that, while also significantly speeding up the operation.


Performance caveat: While the above is memory-efficient, it will be slow with large files; to address that, you must make direct use of the .NET framework, via a System.IO.StreamWriter instance:

# Create the output file.
# Note:
#  * Be sure to use a *full* path, because .NET's current dir. usually differs
#    from PowerShell's
#  * UTF-8 *without a BOM* is used as the character encoding by default,
#    but you may pass a [System.Text.Encoding] instance as needed.
$sw = [System.IO.StreamWriter]::new("$PWD/out.txt")

switch -regex -file in.txt { 
   '(?<=:)[^@:]+@[^:]+:.*' { $sw.WriteLine($Matches[0]) } 
}

$sw.Close()



回答2:


The PowerShell function required is a split and array indexing Here is the code you need

$String = "xx:Mxx:Support:xx:support@xx.com:x19000 xx:Jeremy:xxx:1977-07-22:xxx@gmail.com:bar_baz xxx:Yuxxya:xx:1975-03-28:liz@gdddaxxt.com:loddta999"
$first_seperator = ":"
$sec_seperator = " "

Function seperate ($string,$seperator){
$string.Split($seperator)
}

$mails = seperate $String $sec_seperator
foreach ($String in $mails){$mail = seperate $String $first_seperator
Write-host "The compelete mail info is " $String -f Green
Write-host "The mail is:- " $mail[4]
Write-Host "The password is :-" $mail[5]
}

and The output is as follows:

The compelete mail ifo is  xx:Mxx:Support:xx:support@xx.com:x19000
The mail is:-  support@xx.com
The password is :- x19000
The compelete mail ifo is  xx:Jeremy:xxx:1977-07-22:xxx@gmail.com:bar_baz
The mail is:-  xxx@gmail.com
The password is :- bar_baz
The compelete mail ifo is  xxx:Yuxxya:xx:1975-03-28:liz@gdddaxxt.com:loddta999
The mail is:-  liz@gdddaxxt.com
The password is :- loddta999

I hope that answers your question feel free to ask for any illustration and mark this as answer if it help



来源:https://stackoverflow.com/questions/65448263/extract-emailpassword

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!