Why are slice and range upper-bound exclusive?

懵懂的女人 提交于 2019-12-17 04:30:16

问题


Disclaimer: I am not asking if the upper-bound stopargument of slice()and range() is exclusive or how to use these functions.

Calls to the rangeand slicefunctions, as well as the slice notation [start:stop] all refer to sets of integers.

range([start], stop[, step])
slice([start], stop[, step])

In all these, the stop integer is excluded.

I am wondering why the language is designed this way.

Is it to make stopequal to the number of elements in the represented integer set when start equals 0 or is omitted?

Is it to have:

for i in range(start, stop):

look like the following C code?

for (i = start ; i < stop; i++) {

回答1:


The documentation implies this has a few useful properties:

word[:2]    # The first two characters
word[2:]    # Everything except the first two characters

Here’s a useful invariant of slice operations: s[:i] + s[i:] equals s.

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

I think we can assume that the range functions act the same for consistency.




回答2:


Here's the opinion of some Google+ user:

[...] I was swayed by the elegance of half-open intervals. Especially the invariant that when two slices are adjacent, the first slice's end index is the second slice's start index is just too beautiful to ignore. For example, suppose you split a string into three parts at indices i and j -- the parts would be a[:i], a[i:j], and a[j:].

Google+ is closed, so link doesn't work anymore. Spoiler alert: that was Guido van Rossum.




回答3:


A bit late to this question, nonetheless, this attempts to answer the why-part of your question:

Part of the reason is because we use zero-based indexing/offsets when addressing memory.

The easiest example is an array. Think of an "array of 6 items" as a location to store 6 data items. If this array's start location is at memory address 100, then data, let's say the 6 characters 'apple\0', are stored like this:

memory/
array      contains
location   data
 100   ->   'a'
 101   ->   'p'
 102   ->   'p'
 103   ->   'l'
 104   ->   'e'
 105   ->   '\0'

So for 6 items, our index goes from 100 to 105. Addresses are generated using base + offset, so the first item is at base memory location 100 + offset 0 (i.e., 100 + 0), the second at 100 + 1, third at 100 + 2, ..., until 100 + 5 is the last location.

This is the primary reason we use zero based indexing and leads to language constructs such as for loops in C:

for (int i = 0; i < LIMIT; i++)

or in Python:

for i in range(LIMIT):

When you program in a language like C where you deal with pointers more directly, or assembly even more so, this base+offset scheme becomes much more obvious.

Because of the above, many language constructs automatically use this range from start to length-1.

You might find this article on Zero-based numbering on Wikipedia interesting, and also this question from Software Engineering SE.

Example:

In C for instance if you have an array ar and you subscript it as ar[3] that really is equivalent to taking the (base) address of array ar and adding 3 to it => *(ar+3) which can lead to code like this printing the contents of an array, showing the simple base+offset approach:

for(i = 0; i < 5; i++)
   printf("%c\n", *(ar + i));

really equivalent to

for(i = 0; i < 5; i++)
   printf("%c\n", ar[i]);



回答4:


Here is another reason why an exclusive upper bound is a saner approach:

Suppose you wished to write a function that applies some transform to a subsequence of items in a list. If intervals were to use an inclusive upper bound as you suggest, you might naively try writing it as:

def apply_range_bad(lst, transform, start, end):
     """Applies a transform on the elements of a list in the range [start, end]"""
     left = lst[0 : start-1]
     middle = lst[start : end]
     right = lst[end+1 :]
     return left + [transform(i) for i in middle] + right

At first glance, this seems straightforward and correct, but unfortunately it is subtly wrong.

What would happen if:

  • start == 0
  • end == 0
  • end < 0

? In general, there might be even more boundary cases that you should consider. Who wants to waste time thinking about all of that? (These problems arise because by using inclusive lower and upper bounds, there no inherent way to express an empty interval.)

Instead, by using a model where upper bounds are exclusive, dividing a list into separate slices is simpler, more elegant, and thus less error-prone:

def apply_range_good(lst, transform, start, end):
     """Applies a transform on the elements of a list in the range [start, end)"""
     left = lst[0:start]
     middle = lst[start:end]
     right = lst[end:]
     return left + [transform(i) for i in middle] + right

(Note that apply_range_good does not transform lst[end]; it too treats end as an exclusive upper-bound. Trying to make it use an inclusive upper-bound would still have some of the problems I mentioned earlier. The moral is that inclusive upper-bounds are usually troublesome.)

(Mostly adapted from an old post of mine about inclusive upper-bounds in another scripting language.)




回答5:


Elegant-ness VS Obvious-ness

To be honest, I thought the way of slicing in Python is quite counter-intuitive, it's actually trading the so called elegant-ness with more brain-processing, that is why you can see that this StackOverflow article has more than 2Ks of upvotes, I think it's because there's a lot of people don't understand it intially.

Just for example, the following code had already caused headache for a lot of Python newbies.

x = [1,2,3,4]
print(x[0:1])
# Output is [1]

Not only it is hard to process, it is also hard to explain properly, for example, the explanation for the code above would be take the zeroth element until the element before the first element.

Now look at Ruby which uses upper-bound inclusive.

x = [1,2,3,4]
puts x[0..1]
# Output is [1,2]

To be frank, I really thought the Ruby way of slicing is better for the brain.

Of course, when you are splitting a list into 2 parts based on an index, the exclusive upper bound approach would result in better-looking code.

# Python
x = [1,2,3,4]
pivot = 2
print(x[:pivot]) # [1,2]
print(x[pivot:]) # [3,4]

Now let's looking the inclusive upper bound approach

# Ruby
x = [1,2,3,4]
pivot = 2
puts x[0..(pivot-1)] # [1,2]
puts x[pivot..-1] # [3,4]

Obviously, the code is less elegant, but there's not much brain-processing to be done here.

Conclusion

In the end, it's really a matter about Elegant-ness VS Obvious-ness, and the designers of Python prefer elegant-ness over obvious-ness. Why? Because the Zen of Python states that Beautiful is better than ugly.



来源:https://stackoverflow.com/questions/11364533/why-are-slice-and-range-upper-bound-exclusive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!