Simple text cleaning python 3.6 script not giving correct output on executing the script using cmd prompt on windows 10

感情迁移 提交于 2019-12-11 05:49:39

问题


This script runs perfectly when I executed it on spyder but same script isn't giving correct output when I execute the script through command prompt on my Windows 10 machine. I have python 3.6 and anaconda 3.6 on my machine. It's really weird behaviour. I tried to execute the script on an ubuntu system too but didn't get the correct output on it either.

clean_data.py

import re
import argparse

def main(data):
    if data.strip():
        data = data.strip()
        emoji_pattern = re.compile("["
                "\U0001F600-\U0001F64F"  # emoticons
                "\U0001F000-\U0001F5FF"  # symbols & pictographs
                "\U0001F680-\U0001F6FF"  # transport & map symbols
                "\U0001F1E0-\U0001F1FF"  # flags (iOS)
                "\U0001F900-\U0001F9FF"  # extra emoticons
                "\U00002600-\U000026FF"
                "\U00002700-\U000027BF"
                "\U00002B00-\U00002BFF"
                "\U00003000-\U000032FF"
                "\U000025A0-\U000025FF"
                "\U000024C2-\U0001F251"
                "\U000020D0-\U000120FF"
                "\U00000000-\U0000001a"
                "]+", flags=re.UNICODE)
        data = emoji_pattern.sub("", data)
        data = re.sub("[^A-Za-z0-9 !@#$%^&*()_+=-}]{[|\':;?/>.<,]", "", data).encode("ascii", "ignore").decode("utf-8")
        print(data)
    else:
        print("Empty string!!")

#main("     ")
#main("i'm deciding between Firestik Firefly, 4' \u2248\u001a200w, \n\r& Firestik FS-3BK, 3' \u2248\u001a650w. Is one better? It's for recreational use on and off road. thank you!")

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
    description = __doc__,
    formatter_class = argparse.RawDescriptionHelpFormatter)
    parser.add_argument('data', help = 'Simply the text that you want to clean.')
    args = parser.parse_args()
    main(args.data)

To reproduce, save the script as "clean_data.py"

To execute the script open terminal and type:

python clean_data.py "i'm deciding between Firestik Firefly, 4' \u2248\u001a200w, \n\r& Firestik FS-3BK, 3' \u2248\u001a650w. Is one better? It's for recreational use on and off road. thank you!"

The expected output is:

i'm deciding between Firestik Firefly, 4' 200w, & Firestik FS-3BK, 3' 650w. Is one better? It's for recreational use on and off road. thank you!


回答1:


The cmd shell doesn't understand Python's Unicode escape sequences, so you are receiving literal ASCII characters for the escape codes.

If you want to support translating them, you could change you main call to:

main(args.data.encode(sys.stdin.encoding).decode('unicode-escape'))

And then your output will be:

i'm deciding between Firestik Firefly, 4' 200w, & Firestik FS-3BK, 3' 650w. Is one better? It's for recreational use on
and off road. thank you!


来源:https://stackoverflow.com/questions/49339739/simple-text-cleaning-python-3-6-script-not-giving-correct-output-on-executing-th

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!