How to remove special characters from txt files using Python

后端 未结 3 1810
南笙
南笙 2021-01-06 04:32
from glob import glob
pattern = \"D:\\\\report\\\\shakeall\\\\*.txt\"
filelist = glob(pattern)
def countwords(fp):
    with open(fp) as fh:
        return len(fh.rea         


        
3条回答
  •  情书的邮戳
    2021-01-06 05:12

    import re
    

    Then replace

    [uniquewords.add(x) for x in open(os.path.join(root,name)).read().split()]
    

    By

    [uniquewords.add(re.sub('[^a-zA-Z0-9]*$', '', x) for x in open(os.path.join(root,name)).read().split()]
    

    This will strip all trailing non-alphanumeric characters from each word before adding it to the set.

提交回复
热议问题