How to modify this sed awk command so that the output goes to a file of choice?

折月煮酒 提交于 2019-12-20 07:13:35

问题


I am using the last command from this SO answer https://stackoverflow.com/a/54818581/80353

cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'|tee cap)

What this command currently do

  1. This command will download captions for a youtube video as a .vtt file and
  2. then print out on the terminal the simplified version of the .vtt file

This command works as described.

How to use this command

In the terminal I will run the above command once and then run cap $youtube_url

What I like to have

I would like to modify the original cap() function so that the original behavior remains with one extra part

  1. This command will download captions for a youtube video as a .vtt file (unchanged)
  2. then print out the simplified version of the .vtt file into another file that's stated as parameter $2 (changed)

How I expect to call the new command

Originally, I would call the original command as

cap $youtube_url

Now I like to do this

cap $youtube_url $relative_or_absolute_path_of_text_or_markdown_file

How do I modify the original cap command to achieve the outcome I want?


回答1:


Considering that you want to see output on screen as well as you want to save output into a output file too, if this is the case could you please try following.

cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'|tee -a "$2")

OR in non-one liner form use:

cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
|tee -a "$2")

Please make sure that you have provided complete path in your variable eg--> relative_or_absolute_path_of_text_or_markdown_file="/full/path/output_file.txt" etc just an example. I couldn't test it since I don't have mechanism for vtt files etc in my box.

In case you don't want to print information on screen and simply want to save output into output file then as @oguz ismail's comment use only tee "$2" not tee -a "$2" as I shown above.




回答2:


Thank You @KimStacks @RavinderSingh13 @Oguz-Ismail for posting these solutions above and in the previous post

I managed to get results in the .vtt file with youtube-dl --skip-download --write-auto-sub $youtube_url

However, the format of the output is not ideal for my purpose. I have to delete line by line in order to remove the time as well as the /n new line. So I would like to customize the code syntax to fit my requirements.

NOTE: Not sure whether it's a new query or not, so I will post it here for now:

  1. I have tried all the steps suggested in previous post and here as well but I still can not understand:

    • How to insert the "$youtube_url" inside the code below?

    cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\ sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\ |tee -a "$2")

  2. I tried editing the numbers from 0 to 3 to -1 in 'NR%8==1{printf"%s ",$1}NR%8==3', on both ends but not successfully getting the right format inside the .vtt file. Thus, Is it possible to have:

    • transcripted text printed continously as sentences, rather than each subtitle printed as new lines?

    • remove printout of start time?



来源:https://stackoverflow.com/questions/59244045/how-to-modify-this-sed-awk-command-so-that-the-output-goes-to-a-file-of-choice

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!