How to split large text file in windows?

后端 未结 6 1165
野的像风
野的像风 2020-12-12 13:30

I have a log file with size of 2.5 GB. Is there any way to split this file into smaller files using windows command prompt?

6条回答
  •  情深已故
    2020-12-12 14:20

    Of course there is! Win CMD can do a lot more than just split text files :)

    Split a text file into separate files of 'max' lines each:

    Split text file (max lines each):
    : Initialize
    set input=file.txt
    set max=10000
    
    set /a line=1 >nul
    set /a file=1 >nul
    set out=!file!_%input%
    set /a max+=1 >nul
    
    echo Number of lines in %input%:
    find /c /v "" < %input%
    
    : Split file
    for /f "tokens=* delims=[" %i in ('type "%input%" ^| find /v /n ""') do (
    
    if !line!==%max% (
    set /a line=1 >nul
    set /a file+=1 >nul
    set out=!file!_%input%
    echo Writing file: !out!
    )
    
    REM Write next file
    set a=%i
    set a=!a:*]=]!
    echo:!a:~1!>>out!
    set /a line+=1 >nul
    )
    

    If above code hangs or crashes, this example code splits files faster (by writing data to intermediate files instead of keeping everything in memory):

    eg. To split a file with 7,600 lines into smaller files of maximum 3000 lines.

    1. Generate regexp string/pattern files with set command to be fed to /g flag of findstr

    list1.txt

    \[[0-9]\]
    \[[0-9][0-9]\]
    \[[0-9][0-9][0-9]\]
    \[[0-2][0-9][0-9][0-9]\]

    list2.txt

    \[[3-5][0-9][0-9][0-9]\]

    list3.txt

    \[[6-9][0-9][0-9][0-9]\]

    1. Split the file into smaller files:
    type "%input%" | find /v /n "" | findstr /b /r /g:list1.txt > file1.txt
    type "%input%" | find /v /n "" | findstr /b /r /g:list2.txt > file2.txt
    type "%input%" | find /v /n "" | findstr /b /r /g:list3.txt > file3.txt
    
    1. remove prefixed line numbers for each file split:
      eg. for the 1st file:
    for /f "tokens=* delims=[" %i in ('type "%cd%\file1.txt"') do (
    set a=%i
    set a=!a:*]=]!
    echo:!a:~1!>>file_1.txt)
    

    Notes:
    Works with leading whitespace, blank lines & whitespace lines.

    Tested on Win 10 x64 CMD, on 4.4GB text file, 5651982 lines.

提交回复
热议问题