Python replace function [replace once]

前端 未结 6 1539
隐瞒了意图╮
隐瞒了意图╮ 2020-12-01 12:40

I need help with a program I\'m making in Python.

Assume I wanted to replace every instance of the word \"steak\" to \"ghost\" (just go wit

6条回答
  •  遥遥无期
    2020-12-01 12:52

    Note Considering the viewership of this Question, I undeleted and rewrote it for different types of test cases

    I have considered four competing implementations from the answers

    >>> def sub_noregex(hay):
        """
        The Join and replace routine which outpeforms the regex implementation. This
        version uses generator expression
        """
        return 'steak'.join(e.replace('steak','ghost') for e in hay.split('ghost'))
    
    >>> def sub_regex(hay):
        """
        This is a straight forward regex implementation as suggested by @mgilson
        Note, so that the overheads doesn't add to the cummulative sum, I have placed
        the regex creation routine outside the function
        """
        return re.sub(regex,lambda m:sub_dict[m.group()],hay)
    
    >>> def sub_temp(hay, _uuid = str(uuid4())):
        """
        Similar to Mark Tolonen's implementation but rather used uuid for the temporary string
        value to reduce collission
        """
        hay = hay.replace("steak",_uuid).replace("ghost","steak").replace(_uuid,"steak")
        return hay
    
    >>> def sub_noregex_LC(hay):
        """
        The Join and replace routine which outpeforms the regex implementation. This
        version uses List Comprehension
        """
        return 'steak'.join([e.replace('steak','ghost') for e in hay.split('ghost')])
    

    A generalized timeit function

    >>> def compare(n, hay):
        foo = {"sub_regex": "re",
               "sub_noregex":"",
               "sub_noregex_LC":"",
               "sub_temp":"",
               }
        stmt = "{}(hay)"
        setup = "from __main__ import hay,"
        for k, v in foo.items():
            t = Timer(stmt = stmt.format(k), setup = setup+ ','.join([k, v] if v else [k]))
            yield t.timeit(n)
    

    And the generalized test routine

    >>> def test(*args, **kwargs):
        n = kwargs['repeat']
        print "{:50}{:^15}{:^15}{:^15}{:^15}".format("Test Case", "sub_temp",
                                 "sub_noregex ", "sub_regex",
                                 "sub_noregex_LC ")
        for hay in args:
            hay, hay_str = hay
            print "{:50}{:15.10}{:15.10}{:15.10}{:15.10}".format(hay_str, *compare(n, hay))
    

    And the Test Results are as follows

    >>> test((' '.join(['steak', 'ghost']*1000), "Multiple repeatation of search key"),
             ('garbage '*998 + 'steak ghost', "Single repeatation of search key at the end"),
             ('steak ' + 'garbage '*998 + 'ghost', "Single repeatation of at either end"),
             ("The scary ghost ordered an expensive steak", "Single repeatation for smaller string"),
             repeat = 100000)
    Test Case                                            sub_temp     sub_noregex      sub_regex   sub_noregex_LC 
    Multiple repeatation of search key                   0.2022748797   0.3517142003   0.4518992298   0.1812594258
    Single repeatation of search key at the end          0.2026047957   0.3508259952   0.4399926194   0.1915298898
    Single repeatation of at either end                  0.1877455356   0.3561734007   0.4228843986   0.2164233388
    Single repeatation for smaller string                0.2061019057   0.3145984487   0.4252060592   0.1989413449
    >>> 
    

    Based on the Test Result

    1. Non Regex LC and the temp variable substitution have better performance though the performance of the usage of temp variable is not consistent

    2. LC version has better performance compared to generator (confirmed)

    3. Regex is more than two times slower (so if the piece of code is a bottleneck then the implementation change can be reconsidered)

    4. The Regex and non regex versions are equivalently Robust and can scale

提交回复
热议问题