bash command to convert html page to a text file

前端 未结 10 1191
醉梦人生
醉梦人生 2020-12-09 07:38

I am a beginner to linux. Would you please help me how to convert an html page to a text file. the text file will remove any images and links from the webpage. I want to use

10条回答
  •  情书的邮戳
    2020-12-09 08:21

    Bash script to recursively convert html page to text file. Applied to httpd-manual. Makes grep -Rhi 'LoadModule ssl' /usr/share/httpd/manual_dump -A 10 work convenient.

    #!/bin/sh
    # Adapted from ewwink, recursive html to txt dump
    # Made to kind of recursively (4 levels) dump the /usr/share/httpd manual to a dump httpd manual directory into a txt dump including dir
    # put this script in /usr/share/httpd for it to work (after installing httpd-manual rpm)
    
    for file in ./manual/*{,/*,/*/*,/*/*/*}.html
    do
    new=`basename $file .html`
    mkdir -p ./manual_dump/${new}
    lynx --dump $file > ./manual_dump/${new}.txt
    done
    

提交回复
热议问题