Fast and simple binary concatenate files in Powershell

前端 未结 4 1411
耶瑟儿~
耶瑟儿~ 2020-12-05 10:32

What\'s the best way of concatenating binary files using Powershell? I\'d prefer a one-liner that simple to remember and fast to execute.

The best I\'ve come up with

相关标签:
4条回答
  • 2020-12-05 10:47

    I had a similar problem recently, where I wanted to append two large (2GB) files into a single file (4GB).

    I tried to adjust the -ReadCount parameter for Get-Content, however I couldn't get it to improve my performance for the large files.

    I went with the following solution:

    function Join-File (
        [parameter(Position=0,Mandatory=$true,ValueFromPipeline=$true)]
        [string[]] $Path,
        [parameter(Position=1,Mandatory=$true)]
        [string] $Destination
    )
    {
        write-verbose "Join-File: Open Destination1 $Destination"
        $OutFile = [System.IO.File]::Create($Destination)
        foreach ( $File in $Path ) {
            write-verbose "   Join-File: Open Source $File"
            $InFile = [System.IO.File]::OpenRead($File)
            $InFile.CopyTo($OutFile)
            $InFile.Dispose()
        }
        $OutFile.Dispose()
        write-verbose "Join-File: finished"
    } 
    

    Performance:

    • cmd.exe /c copy file1+file2 File3 around 5 seconds (Best)
    • gc file1,file2 |sc file3 around 1100 seconds (yuck)
    • join-file File1,File2 File3 around 16 seconds (OK)
    0 讨论(0)
  • 2020-12-05 10:55

    Performance is very much dependent on the buffer size used. Those are fairly small by default. Concatenating 2x2GB files I'd take a buffersize of about 256kb. Going larger might sometimes fail, smaller and you'll get less throughput than your drive is capable of.

    With gc that'd be with -ReadCount not simply -Read (PowerShell 5.0):

    gc -ReadCount 256KB -Path $infile -Encoding Byte | ...
    

    Plus I found Add-Content to be better and going file-by-file for a lot of small files, because piping only a moderate amount of data (200MB) I found my computer going oom, PowerShell freezing and CPU at full.

    Although Add-Content randomly fails a few times for a few hundred files with an error about the destination file being in use, so I added a while loop and a try catch:

    # Empty the file first
    sc -Path "$path\video.ts" -Value @() -Encoding Byte 
    $tsfiles | foreach {    
        while ($true) {
            try { # I had -ReadCount 0 because the files are smaller than 256KB
                gc -ReadCount 0 -Path "$path\$_" -Encoding Byte | `
                    Add-Content -Path "$path\video.ts" -Encoding Byte -ErrorAction Stop
                break;
            } catch {
            }
        }
    }
    

    Using a file stream is much faster still. You cannot specify a buffer size with [System.IO.File]::Open but you can with new [System.IO.FileStream] like so:

    # $path = "C:\"
    $ins = @("a.ts", "b.ts")
    $outfile = "$path\out.mp4"
    $out = New-Object -TypeName "System.IO.FileStream" -ArgumentList @(
        $outfile, 
        [System.IO.FileMode]::Create,
        [System.IO.FileAccess]::Write,
        [System.IO.FileShare]::None,
        256KB,
        [System.IO.FileOptions]::None)
    try {
        foreach ($in in $ins) {
            $fs = New-Object -TypeName "System.IO.FileStream" -ArgumentList @(
                "$path\$in", 
                [System.IO.FileMode]::Open,
                [System.IO.FileAccess]::Read,
                [System.IO.FileShare]::Read,
                256KB,
                [System.IO.FileOptions]::SequentialScan)
            try {
                $fs.CopyTo($out)
            } finally {
                $fs.Dispose()
            }
        }
    } finally {
        $out.Dispose()
    }
    
    0 讨论(0)
  • 2020-12-05 10:58

    It's not Powershell, but if you have Powershell you also have the command prompt:

    copy /b 1.bin+2.bin 3.bin
    

    As Keith Hill pointed out, if you really need to run it from inside Powershell, you can use:

    cmd /c copy /b 1.bin+2.bin 3.bin 
    
    0 讨论(0)
  • 2020-12-05 11:02

    The approach you're taking is the way I would do it in PowerShell. However you should use the -ReadCount parameter to improve perf. You can also take advantage of positional parameters to shorten this even further:

    gc File1.bin,File2.bin -Enc Byte -Read 512 | sc new.bin -Enc Byte
    

    Regarding the use of the -ReadCount parameter, I did a blog post on this a while ago that folks might find useful - Optimizing Performance of Get Content for Large Files.

    0 讨论(0)
提交回复
热议问题