Python - beautifulsoup, apply in every text file in folder and produce new text file

前端 未结 2 1959
暖寄归人
暖寄归人 2021-01-21 18:15

I am using the following Python - Beautifulsoup code to remove html elements from a text file:

from bs4 import BeautifulSoup

with open(\"textFileWithHtml.txt\")         


        
2条回答
  •  执念已碎
    2021-01-21 18:45

    I would leave that work to the OS, simply replace the hardcoded input file with input from external source, in argv array, and invoke the script inside a loop or with a regular expression that matches many files, like:

    from bs4 import BeautifulSoup
    import sys
    
    for fi in sys.argv[1:]:
        with open(fi) as markup:
            soup = BeautifulSoup(markup.read())
    
        with open("strip_" + fi, "w") as f: 
            f.write(soup.get_text().encode('utf-8'))
    

    And run it like:

    python script.py *.txt
    

提交回复
热议问题