Count number of line in txt file when new line is inside data

[亡魂溺海] 提交于 2021-01-02 18:05:29

问题


I have one txt file which has below data

Name    mobile  url message text
test11  1234567890  www.google.com  "Data Test New
Date:27/02/2020
Items: 1
Total: 3
Regards
ABC DATa
Ph:091 : 123456789"
test12  1234567891  www.google.com  "Data Test New one
Date:17/02/2020
Items: 26
Total: 5
Regards
user test
Ph:091 : 433333333"

Now you can see my last column data has new line character. so when I use below command

awk 'END{print NR}' file.txt

it is giving my length is 15 but actually line length is 3 . Please suggest command for the same

Edited Part: As per the answer given the below script is not working if there's no newline at the end of input file

awk -v RS='"[^"]*"' '{gsub(/\n/, " ", RT); ORS=RT} END{print NR "\n"}' test.txt 

Also my file may have 3-4 Million of records . So converting file to unix format will take time and that is not my preference. So Please suggest some optimum solution which should work in both case

head 5.csv | cat -A  
Above command is giving me the output

Name mobile url message text^M$


回答1:


Using gnu-awk you can do this using a custom RS:

awk -v RS='"[^"]*"' '{gsub(/(\r?\n){2,}/, "\n"); n+=gsub(/\n/, "&")}
END {print n}' <(sed '$s/$//' file)

15001

Here:

  • -v RS='"[^"]*"': Uses this regex as input record separator. Which matches a double quoted string
  • n+=gsub(/\n/, "&"): Dummy replace \n with itself and counts \n in variable n
  • END {print n}: Prints n in the end
  • sed '$s/$//' file: For last line adds a newline (in case it is missing)

Code Demo




回答2:


With perl, assuming last line always ends with a newline character

$ perl -0777 -nE 'say s/"[^"]+"(*SKIP)(*F)|\n//g' ip.txt
3
  • -0777 to slurp entire input file as a single string, so this isn't suitable if the input file is very large
  • the s command returns number of substitutions made, which is used here to get the count of newlines
  • "[^"]+"(*SKIP)(*F) will cause newlines within double quotes to be ignored

You can use the below command if you want to count the last line even if it doesn't end with newline character.

perl -0777 -nE 'say scalar split /"[^"]+"(*SKIP)(*F)|\n/' ip.txt



回答3:


Same as anubhava but with GNU sed:

<infile sed '/"/ { :a; N; /"$/!ba; s/\n/ /g; }' | wc -l

Output:

3


来源:https://stackoverflow.com/questions/65035029/count-number-of-line-in-txt-file-when-new-line-is-inside-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!