Which is the preferred way to concatenate a string in Python?

前端 未结 12 981
眼角桃花
眼角桃花 2020-11-22 06:45

Since Python\'s string can\'t be changed, I was wondering how to concatenate a string more efficiently?

I can write like it:

s += string         


        
12条回答
  •  一个人的身影
    2020-11-22 07:16

    As @jdi mentions Python documentation suggests to use str.join or io.StringIO for string concatenation. And says that a developer should expect quadratic time from += in a loop, even though there's an optimisation since Python 2.4. As this answer says:

    If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

    I will show an example of real-world code that naively relied on += this optimisation, but it didn't apply. The code below converts an iterable of short strings into bigger chunks to be used in a bulk API.

    def test_concat_chunk(seq, split_by):
        result = ['']
        for item in seq:
            if len(result[-1]) + len(item) > split_by: 
                result.append('')
            result[-1] += item
        return result
    

    This code can literary run for hours because of quadratic time complexity. Below are alternatives with suggested data structures:

    import io
    
    def test_stringio_chunk(seq, split_by):
        def chunk():
            buf = io.StringIO()
            size = 0
            for item in seq:
                if size + len(item) <= split_by:
                    size += buf.write(item)
                else:
                    yield buf.getvalue()
                    buf = io.StringIO()
                    size = buf.write(item)
            if size:
                yield buf.getvalue()
    
        return list(chunk())
    
    def test_join_chunk(seq, split_by):
        def chunk():
            buf = []
            size = 0
            for item in seq:
                if size + len(item) <= split_by:
                    buf.append(item)
                    size += len(item)
                else:
                    yield ''.join(buf)                
                    buf.clear()
                    buf.append(item)
                    size = len(item)
            if size:
                yield ''.join(buf)
    
        return list(chunk())
    

    And a micro-benchmark:

    import timeit
    import random
    import string
    import matplotlib.pyplot as plt
    
    line = ''.join(random.choices(
        string.ascii_uppercase + string.digits, k=512)) + '\n'
    x = []
    y_concat = []
    y_stringio = []
    y_join = []
    n = 5
    for i in range(1, 11):
        x.append(i)
        seq = [line] * (20 * 2 ** 20 // len(line))
        chunk_size = i * 2 ** 20
        y_concat.append(
            timeit.timeit(lambda: test_concat_chunk(seq, chunk_size), number=n) / n)
        y_stringio.append(
            timeit.timeit(lambda: test_stringio_chunk(seq, chunk_size), number=n) / n)
        y_join.append(
            timeit.timeit(lambda: test_join_chunk(seq, chunk_size), number=n) / n)
    plt.plot(x, y_concat)
    plt.plot(x, y_stringio)
    plt.plot(x, y_join)
    plt.legend(['concat', 'stringio', 'join'], loc='upper left')
    plt.show()
    

提交回复
热议问题