问题
How to separate tokens in line using Unix? showed that a file is tokenizable using sed
or xargs
.
Is there a way to do the reverse?
[in:]
some
sentences
are
like
this.
some
sentences
foo
bar
that
[out]:
some sentences are like this.
some sentences foo bar that
The only delimiter per sentence is the \n\n
. I could have done the following in python, but is there a unix way?
def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n'):
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)
print ["".join(i).replace("\n"," ") for i in per_section(codecs.open('outfile.txt','r','utf8'))]
[out:]
[u'some sentences are like this. ', u'some sentences foo bar that ']
回答1:
using awk is eaiser to handle this kind of task:
awk -v RS="" '{$1=$1}7' file
if you want to keep multiple spaces in your each line, you could
awk -v RS="" -F'\n' '{$1=$1}7' file
with your example:
kent$ cat f
some
sentences
are
like
this.
some
sentences
foo
bar
that
kent$ awk -v RS="" '{$1=$1}7' f
some sentences are like this.
some sentences foo bar that
回答2:
You can do with awk
command as follows:
awk -v RS="\n\n" '{gsub("\n"," ",$0);print $0}' file.txt
Set the record separator as \n\n
which means the strings are tokenized in a group of lines separated by a blank line. Now, print that token after replacing all the \n
by a space character.
回答3:
sed -n --posix 'H;$ {x;s/\n\([^[:cntrl:]]\{1,\}\)/\1 /gp;}' YourFile
Based on blank line separation so, each string could differ in length also
来源:https://stackoverflow.com/questions/21779272/reverse-newline-tokenization-in-one-token-per-line-files-unix