Python, remove all non-alphabet chars from string

后端 未结 6 1517
时光说笑
时光说笑 2020-11-30 21:08

I am writing a python MapReduce word count program. Problem is that there are many non-alphabet chars strewn about in the data, I have found this post Stripping everything b

6条回答
  •  孤街浪徒
    2020-11-30 21:52

    The fastest method is regex

    #Try with regex first
    t0 = timeit.timeit("""
    s = r2.sub('', st)
    
    """, setup = """
    import re
    r2 = re.compile(r'[^a-zA-Z0-9]', re.MULTILINE)
    st = 'abcdefghijklmnopqrstuvwxyz123456789!@#$%^&*()-=_+'
    """, number = 1000000)
    print(t0)
    
    #Try with join method on filter
    t0 = timeit.timeit("""
    s = ''.join(filter(str.isalnum, st))
    
    """, setup = """
    st = 'abcdefghijklmnopqrstuvwxyz123456789!@#$%^&*()-=_+'
    """,
    number = 1000000)
    print(t0)
    
    #Try with only join
    t0 = timeit.timeit("""
    s = ''.join(c for c in st if c.isalnum())
    
    """, setup = """
    st = 'abcdefghijklmnopqrstuvwxyz123456789!@#$%^&*()-=_+'
    """, number = 1000000)
    print(t0)
    
    
    2.6002226710006653 Method 1 Regex
    5.739747313000407 Method 2 Filter + Join
    6.540099570000166 Method 3 Join
    

提交回复
热议问题