I\'m able to find a bevy of information online (on Stack Overflow and otherwise) about how it\'s a very inefficient and bad practice to use + or +=>
I think this behaviour is best explained in Lua's string buffer chapter.
To rewrite that explanation in context of Python, let's start with an innocent code snippet (a derivative of the one at Lua's docs):
s = ""
for l in some_list:
s += l
Assume that each l is 20 bytes and the s has already been parsed to a size of 50 KB. When Python concatenates s + l it creates a new string with 50,020 bytes and copies 50 KB from s into this new string. That is, for each new line, the program moves 50 KB of memory, and growing. After reading 100 new lines (only 2 KB), the snippet has already moved more than 5 MB of memory. To make things worse, after the assignment
s += l
the old string is now garbage. After two loop cycles, there are two old strings making a total of more than 100 KB of garbage. So, the language compiler decides to run its garbage collector and frees those 100 KB. The problem is that this will happen every two cycles and the program will run its garbage collector two thousand times before reading the whole list. Even with all this work, its memory usage will be a large multiple of the list's size.
And, at the end:
This problem is not peculiar to Lua: Other languages with true garbage collection, and where strings are immutable objects, present a similar behavior, Java being the most famous example. (Java offers the structure
StringBufferto ameliorate the problem.)
Python strings are also immutable objects.