How can I extract a predetermined range of lines from a text file on Unix?

问题

I have a ~23000 line SQL dump containing several databases worth of data. I need to extract a certain section of this file (i.e. the data for a single database) and place it in a new file. I know both the start and end line numbers of the data that I want.

Does anyone know a Unix command (or series of commands) to extract all lines from a file between say line 16224 and 16482 and then redirect them into a new file?

回答1:

sed -n '16224,16482p;16483q' filename > newfile

From the sed manual:

p - Print out the pattern space (to the standard output). This command is usually only used in conjunction with the -n command-line option.

n - If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If there is no more input then sed exits without processing any more commands.

q - Exit sed without processing any more commands or input. Note that the current pattern space is printed if auto-print is not disabled with the -n option.

and

Addresses in a sed script can be in any of the following forms:

number Specifying a line number will match only that line in the input.

An address range can be specified by specifying two addresses separated by a comma (,). An address range matches lines starting from where the first address matches, and continues until the second address matches (inclusively).

回答2:

sed -n '16224,16482 p' orig-data-file > new-file

Where 16224,16482 are the start line number and end line number, inclusive. This is 1-indexed. -n suppresses echoing the input as output, which you clearly don't want; the numbers indicate the range of lines to make the following command operate on; the command p prints out the relevant lines.

回答3:

Quite simple using head/tail:

head -16482 in.sql | tail -258 > out.sql

using sed:

sed -n '16482,16482p' in.sql > out.sql

using awk:

awk 'NR>=10&&NR<=20' in.sql > out.sql

回答4:

You could use 'vi' and then the following command:

:16224,16482w!/tmp/some-file

Alternatively:

cat file | head -n 16482 | tail -n 258

EDIT:- Just to add explanation, you use head -n 16482 to display first 16482 lines then use tail -n 258 to get last 258 lines out of the first output.

回答5:

There is another approach with awk:

awk 'NR==16224, NR==16482' file

If the file is huge, it can be good to exit after reading the last desired line. This way, it won't read the following lines unnecessarily:

awk 'NR==16224, NR==16482-1; NR==16482 {print; exit}' file

回答6:

perl -ne 'print if 16224..16482' file.txt > new_file.txt

回答7:

 # print section of file based on line numbers
 sed -n '16224 ,16482p'               # method 1
 sed '16224,16482!d'                 # method 2

回答8:

cat dump.txt | head -16224 | tail -258

should do the trick. The downside of this approach is that you need to do the arithmetic to determine the argument for tail and to account for whether you want the 'between' to include the ending line or not.

回答9:

sed -n '16224,16482p' < dump.sql

回答10:

Standing on the shoulders of boxxar, I like this:

sed -n '<first line>,$p;<last line>q' input

e.g.

sed -n '16224,$p;16482q' input

The $ means "last line", so the first command makes sed print all lines starting with line 16224 and the second command makes sed quit after printing line 16428. (Adding 1 for the q-range in boxxar's solution does not seem to be necessary.)

I like this variant because I don't need to specify the ending line number twice. And I measured that using $ does not have detrimental effects on performance.

回答11:

Quick and dirty:

head -16428 < file.in | tail -259 > file.out

Probably not the best way to do it but it should work.

BTW: 259 = 16482-16224+1.

回答12:

I wrote a Haskell program called splitter that does exactly this: have a read through my release blog post.

You can use the program as follows:

$ cat somefile | splitter 16224-16482

And that is all that there is to it. You will need Haskell to install it. Just:

$ cabal install splitter

And you are done. I hope that you find this program useful.

回答13:

Even we can do this to check at command line:

cat filename|sed 'n1,n2!d' > abc.txt

For Example:

cat foo.pl|sed '100,200!d' > abc.txt

回答14:

Using ruby:

ruby -ne 'puts "#{$.}: #{$_}" if $. >= 32613500 && $. <= 32614500' < GND.rdf > GND.extract.rdf

回答15:

I was about to post the head/tail trick, but actually I'd probably just fire up emacs. ;-)

esc-x goto-line ret 16224
mark (ctrl-space)
esc-x goto-line ret 16482
esc-w

open the new output file, ctl-y save

Let's me see what's happening.

回答16:

I would use:

awk 'FNR >= 16224 && FNR <= 16482' my_file > extracted.txt

FNR contains the record (line) number of the line being read from the file.

回答17:

I wanted to do the same thing from a script using a variable and achieved it by putting quotes around the $variable to separate the variable name from the p:

sed -n "$first","$count"p imagelist.txt >"$imageblock"

I wanted to split a list into separate folders and found the initial question and answer a useful step. (split command not an option on the old os I have to port code to).

回答18:

I wrote a small bash script that you can run from your command line, so long as you update your PATH to include its directory (or you can place it in a directory that is already contained in the PATH).

Usage: $ pinch filename start-line end-line

#!/bin/bash
# Display line number ranges of a file to the terminal.
# Usage: $ pinch filename start-line end-line
# By Evan J. Coon

FILENAME=$1
START=$2
END=$3

ERROR="[PINCH ERROR]"

# Check that the number of arguments is 3
if [ $# -lt 3 ]; then
    echo "$ERROR Need three arguments: Filename Start-line End-line"
    exit 1
fi

# Check that the file exists.
if [ ! -f "$FILENAME" ]; then
    echo -e "$ERROR File does not exist. \n\t$FILENAME"
    exit 1
fi

# Check that start-line is not greater than end-line
if [ "$START" -gt "$END" ]; then
    echo -e "$ERROR Start line is greater than End line."
    exit 1
fi

# Check that start-line is positive.
if [ "$START" -lt 0 ]; then
    echo -e "$ERROR Start line is less than 0."
    exit 1
fi

# Check that end-line is positive.
if [ "$END" -lt 0 ]; then
    echo -e "$ERROR End line is less than 0."
    exit 1
fi

NUMOFLINES=$(wc -l < "$FILENAME")

# Check that end-line is not greater than the number of lines in the file.
if [ "$END" -gt "$NUMOFLINES" ]; then
    echo -e "$ERROR End line is greater than number of lines in file."
    exit 1
fi

# The distance from the end of the file to end-line
ENDDIFF=$(( NUMOFLINES - END ))

# For larger files, this will run more quickly. If the distance from the
# end of the file to the end-line is less than the distance from the
# start of the file to the start-line, then start pinching from the
# bottom as opposed to the top.
if [ "$START" -lt "$ENDDIFF" ]; then
    < "$FILENAME" head -n $END | tail -n +$START
else
    < "$FILENAME" tail -n +$START | head -n $(( END-START+1 ))
fi

# Success
exit 0

回答19:

This might work for you (GNU sed):

sed -ne '16224,16482w newfile' -e '16482q' file

or taking advantage of bash:

sed -n $'16224,16482w newfile\n16482q' file

回答20:

The -n in the accept answers work. Here's another way in case you're inclined.

cat $filename | sed "${linenum}p;d";

This does the following:

pipe in the contents of a file (or feed in the text however you want).
sed selects the given line, prints it
d is required to delete lines, otherwise sed will assume all lines will eventually be printed. i.e., without the d, you will get all lines printed by the selected line printed twice because you have the ${linenum}p part asking for it to be printed. I'm pretty sure the -n is basically doing the same thing as the d here.

回答21:

Since we are talking about extracting lines of text from a text file, I will give an special case where you want to extract all lines that match a certain pattern.

myfile content:
=====================
line1 not needed
line2 also discarded
[Data]
first data line
second data line
=====================
sed -n '/Data/,$p' myfile

Will print the [Data] line and the remaining. If you want the text from line1 to the pattern, you type: sed -n '1,/Data/p' myfile. Furthermore, if you know two pattern (better be unique in your text), both the beginning and end line of the range can be specified with matches.

sed -n '/BEGIN_MARK/,/END_MARK/p' myfile

回答22:

Using ed:

ed -s infile <<<'16224,16482p'

-s suppresses diagnostic output; the actual commands in a here-string. Specifically, 16224,16482p runs the p (print) command on the desired line address range.

来源：https://stackoverflow.com/questions/83329/how-can-i-extract-a-predetermined-range-of-lines-from-a-text-file-on-unix

标签

unix

command-line

sed

text-processing