Looking for MORE/MOVE solutions that can handle files with more than 65534 rows

时光总嘲笑我的痴心妄想 提交于 2020-01-06 06:57:23

问题


I have numerous uniquely named .CSV files that I need to remove the first 17 lines from. Some of these files exceed 65534 rows so my MORE/MOVE Batch script is not working. Looking for alternative solutions.

@echo off

for %%a in (*.csv) do (
    more +17 "%%a" >"%%a.new"
    move /y "%%a.new" "%%a" >nul
)

Regardless of number of rows input I am looking to have the 17 header rows removed and new file with all remaining rows built.


回答1:


Here's a powershell option; this one uses a stream to cater for your large files:

$csvs = Get-ChildItem -Path "P:\ath to\your csvs" -Filter *.csv
foreach ( $csv in $csvs ) {
    $fin = New-Object System.IO.StreamReader( $csv.FullName )
    $fout = New-Object System.IO.StreamWriter( $csv.FullName+".new" )
    try {
        for( $s = 1; $s -le 17 -and !$fin.EndOfStream; $s++ ) {
            $fin.ReadLine()
        }
        while( !$fin.EndOfStream ) {
            $fout.WriteLine( $fin.ReadLine() )
        }
    }
    finally {
        $fout.Close()
        $fin.Close()
    }
}

Just change the path to your .csvs on the first line, before testing it.

I have purposely left out the deletion of the original files, simply appending .new to the new filenames to allow you time to check the results, test the speed etc. I will leave it to you to include a Rename/Delete or Move should you feel the need to extend the functionality.




回答2:


Here's a one-line solution

for %%a in (*.txt) do powershell -Com "sc -Path '%%a' -Value (gc '%%a' | select -Skip 17)"

where gc and sc are default aliases for Get-Content and Set-Content respectively. See also

  • Powershell select-object skip multiple lines?
  • Powershell skip first 2 lines of txt file when importing it

If your files are huge then it'll be better to read in lines or blocks which can also be implemented easily using file functions, [IO.File]::OpenText or the -ReadCount option of Get-Content in PowerShell

  • Reading large text files with Powershell
  • Reading very BIG text files using PowerShell
  • How to process a file in PowerShell line-by-line as a stream
  • How can I make this PowerShell script parse large files faster?

As Squashman mentioned, for /f also has an option to skip lines at the beginning of the file

for %%a in (*.csv) do (
    for /f "usebackq skip=17 delims=" %%l in ("%%f") do @echo(%%l>>"%%a.new"
    move /y "%%a.new" "%%a" >nul
)

But that won't work if your file contains lines with special characters like & or |. For more information about it run for /?




回答3:


Make your own cut command. This is VBScript ported to VB.NET.

Cut

cut {t|b} {i|x} NumOfLines

Cuts the number of lines from the top or bottom of file.

t - top of the file
b - bottom of the file
i - include n lines
x - exclude n lines

Example

cut t i 5 < "%systemroot%\win.ini"

Cut.bat

REM Cut.bat
REM This file compiles Cut.vb to Cut.exe
REM Cut.exe Removes specified from top or bottom of lines from StdIn and writes to StdOut 
REM To use 
REM cut {t|b} {i|x} NumOfLines
Rem Cuts the number of lines from the top or bottom of file.
Rem t - top of the file
Rem b - bottom of the file
Rem i - include n lines
Rem x - exclude n lines
Rem
Rem Example - Includes first 5 lines Win.ini
Rem 
Rem cut t i 5 < "%systemroot%\win.ini"
"C:\Windows\Microsoft.NET\Framework\v4.0.30319\vbc.exe" /target:exe /out:"%~dp0\Cut.exe" "%~dp0\Cut.vb" /verbose
pause

Cut.vb

'DeDup.vb
Imports System
Imports System.IO
Imports System.Runtime.InteropServices
Imports Microsoft.Win32

Public Module DeDup
Sub Main
    Dim Arg() As Object
    Dim RS as Object
    Dim LineCount as Object
    Dim Line as Object
    Arg = Split(Command(), " ")
    rs = CreateObject("ADODB.Recordset")
    With rs
        .Fields.Append("LineNumber", 4)
        .Fields.Append("Txt", 201, 5000) 
        .Open
        LineCount = 0
        Line=Console.readline
        Do Until Line = Nothing
            LineCount = LineCount + 1
            .AddNew
            .Fields("LineNumber").value = LineCount
            .Fields("Txt").value = Console.readline
            .UpDate
            Line = Console.ReadLine
        Loop

        .Sort = "LineNumber ASC"

        If LCase(Arg(0)) = "t" then
            If LCase(Arg(1)) = "i" then
                .filter = "LineNumber < " & LCase(Arg(2)) + 1
            ElseIf LCase(Arg(1)) = "x" then
                .filter = "LineNumber > " & LCase(Arg(2))
            End If
        ElseIf LCase(Arg(0)) = "b" then
            If LCase(Arg(1)) = "i" then
                .filter = "LineNumber > " & LineCount - LCase(Arg(2))
            ElseIf LCase(Arg(1)) = "x" then
                .filter = "LineNumber < " & LineCount - LCase(Arg(2)) + 1
            End If
        End If

        Do While not .EOF
            Console.writeline(.Fields("Txt").Value)
            .MoveNext
        Loop
    End With

End Sub 
End Module


来源:https://stackoverflow.com/questions/55620008/looking-for-more-move-solutions-that-can-handle-files-with-more-than-65534-rows

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!