String concatenation vs. string substitution in Python

前端未结

关注

 9  1637

In Python, the where and when of using string concatenation versus string substitution eludes me. As the string concatenation has seen large boosts in performance, is this (

相关标签:

9条回答

故里飘歌

2020-11-29 17:40
Don't forget about named substitution:
```
def so_question_uri_namedsub(q_num):
    return "%(domain)s%(questions)s/%(q_num)d" % locals()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
执笔经年

2020-11-29 17:41

"As the string concatenation has seen large boosts in performance..."

If performance matters, this is good to know.

However, performance problems I've seen have never come down to string operations. I've generally gotten in trouble with I/O, sorting and O(n²) operations being the bottlenecks.

Until string operations are the performance limiters, I'll stick with things that are obvious. Mostly, that's substitution when it's one line or less, concatenation when it makes sense, and a template tool (like Mako) when it's large.

0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2020-11-29 17:43
I was just testing the speed of different string concatenation/substitution methods out of curiosity. A google search on the subject brought me here. I thought I would post my test results in the hope that it might help someone decide.
```
    import timeit
    def percent_():
            return "test %s, with number %s" % (1,2)

    def format_():
            return "test {}, with number {}".format(1,2)

    def format2_():
            return "test {1}, with number {0}".format(2,1)

    def concat_():
            return "test " + str(1) + ", with number " + str(2)

    def dotimers(func_list):
            # runs a single test for all functions in the list
            for func in func_list:
                    tmr = timeit.Timer(func)
                    res = tmr.timeit()
                    print "test " + func.func_name + ": " + str(res)

    def runtests(func_list, runs=5):
            # runs multiple tests for all functions in the list
            for i in range(runs):
                    print "----------- TEST #" + str(i + 1)
                    dotimers(func_list)
```
...After running runtests((percent_, format_, format2_, concat_), runs=5), I found that the % method was about twice as fast as the others on these small strings. The concat method was always the slowest (barely). There were very tiny differences when switching the positions in the format() method, but switching positions was always at least .01 slower than the regular format method.

Sample of test results:
```
    test concat_()  : 0.62  (0.61 to 0.63)
    test format_()  : 0.56  (consistently 0.56)
    test format2_() : 0.58  (0.57 to 0.59)
    test percent_() : 0.34  (0.33 to 0.35)
```
I ran these because I do use string concatenation in my scripts, and I was wondering what the cost was. I ran them in different orders to make sure nothing was interfering, or getting better performance being first or last. On a side note, I threw in some longer string generators into those functions like "%s" + ("a" * 1024) and regular concat was almost 3 times as fast (1.1 vs 2.8) as using the format and % methods. I guess it depends on the strings, and what you are trying to achieve. If performance really matters, it might be better to try different things and test them. I tend to choose readability over speed, unless speed becomes a problem, but thats just me. SO didn't like my copy/paste, i had to put 8 spaces on everything to make it look right. I usually use 4.
0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2020-11-29 17:45
What you want to concatenate/interpolate and how you want to format the result should drive your decision.
- String interpolation allows you to easily add formatting. In fact, your string interpolation version doesn't do the same thing as your concatenation version; it actually adds an extra forward slash before the q_num parameter. To do the same thing, you would have to write return DOMAIN + QUESTIONS + "/" + str(q_num) in that example.
- Interpolation makes it easier to format numerics; "%d of %d (%2.2f%%)" % (current, total, total/current) would be much less readable in concatenation form.
- Concatenation is useful when you don't have a fixed number of items to string-ize.
Also, know that Python 2.6 introduces a new version of string interpolation, called string templating:
```
def so_question_uri_template(q_num):
    return "{domain}/{questions}/{num}".format(domain=DOMAIN,
                                               questions=QUESTIONS,
                                               num=q_num)
```
String templating is slated to eventually replace %-interpolation, but that won't happen for quite a while, I think.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2020-11-29 17:57

Remember, stylistic decisions are practical decisions, if you ever plan on maintaining or debugging your code :-) There's a famous quote from Knuth (possibly quoting Hoare?): "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

As long as you're careful not to (say) turn a O(n) task into an O(n²) task, I would go with whichever you find easiest to understand..

0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2020-11-29 17:58

Concatenation is (significantly) faster according to my machine. But stylistically, I'm willing to pay the price of substitution if performance is not critical. Well, and if I need formatting, there's no need to even ask the question... there's no option but to use interpolation/templating.

>>> import timeit
>>> def so_q_sub(n):
...  return "%s%s/%d" % (DOMAIN, QUESTIONS, n)
...
>>> so_q_sub(1000)
'http://stackoverflow.com/questions/1000'
>>> def so_q_cat(n):
...  return DOMAIN + QUESTIONS + '/' + str(n)
...
>>> so_q_cat(1000)
'http://stackoverflow.com/questions/1000'
>>> t1 = timeit.Timer('so_q_sub(1000)','from __main__ import so_q_sub')
>>> t2 = timeit.Timer('so_q_cat(1000)','from __main__ import so_q_cat')
>>> t1.timeit(number=10000000)
12.166618871951641
>>> t2.timeit(number=10000000)
5.7813972166853773
>>> t1.timeit(number=1)
1.103492206766532e-05
>>> t2.timeit(number=1)
8.5206360154188587e-06

>>> def so_q_tmp(n):
...  return "{d}{q}/{n}".format(d=DOMAIN,q=QUESTIONS,n=n)
...
>>> so_q_tmp(1000)
'http://stackoverflow.com/questions/1000'
>>> t3= timeit.Timer('so_q_tmp(1000)','from __main__ import so_q_tmp')
>>> t3.timeit(number=10000000)
14.564135316080637

>>> def so_q_join(n):
...  return ''.join([DOMAIN,QUESTIONS,'/',str(n)])
...
>>> so_q_join(1000)
'http://stackoverflow.com/questions/1000'
>>> t4= timeit.Timer('so_q_join(1000)','from __main__ import so_q_join')
>>> t4.timeit(number=10000000)
9.4431309007150048

0 讨论(0)

1 2 下一页