What happens at a hardware level when I access an element of an array?

问题

int arr [] = {69, 1, 12, 10, 20, 113};

What happens when I do

int x = a[3];

????

I was always under the impression that a[3] meant something like:

"Start at the memory address arr. Walk 3 memory addresses forward. Get the integer represented at that memory address."

But then I'm confused about how hash tables work. Because if hash tables are implemented as an array of "buckets" (like the professor says in this lecture: https://www.youtube.com/watch?v=UPo-M8bzRrc), then you still have to walk to the bucket you need; hence, they are no more efficient for access than an array would be.

Can someone clear this up for me?

回答1:

Imagine memory as a big, two-column table:

+---------+-------+
| ADDRESS | VALUE |
+---------+-------+
|     ... |   ... |
+---------+-------+
|     100 |    69 |  <-- &arr[0] is 100
+---------+-------+
|     101 |     1 |
+---------+-------+
|     102 |    12 |
+---------+-------+
|     103 |    10 |  <-- &arr[3] is 103
+---------+-------+
|     104 |    20 |
+---------+-------+
|     105 |   113 |
+---------+-------+
|     ... |   ... |
+---------+-------+

I want to emphasize that this is a highly simplified model, but it should give you an idea of what is going on. Your computer knows your array begins at, let's say address 100. And, because all of the elements in a given array are the same size, you can easily access the third element of the array by adding +3 to the beginning address. The computer does not need to "walk" to the third element of the array, it simply grabs the value that is stored in memory at address 100 + 3.

If you want to see an example of this in action, compile and run the following code:

#include <iostream>
using namespace std;

int main() {
    int a[] = { 1, 2, 3 };
    cout << "Address of a:\t\t" << &a[0] << endl;
    cout << "Address of a[2]:\t" << &a[2] << endl;
    return 0;
}

Make note of the address of a. Assuming your computer is using 32-bit integers, you should see that the address of a[2] is simply the address of a + 2*4. The reason it adds 2*4 and not just 2 is because each integer actually uses 4 bytes of memory (i.e. a single value will span 4 addresses).

回答2:

int x = a[3];

The hardware does (address of a)+(3*sizeof(int))

This is a standard indexing operation and dedicated hardware is generally available to do it in one step.

回答3:

If you write something like this:

int x = a[3];

then compiler knows right at the compile time, where to seek for the variable, so it can set the relative and precise memory position at compile time. Processor doesn't need to calculate variable's place in memory.

"Start at the memory address arr. Walk 3 memory addresses forward. Get the integer represented at that memory address."

So basically, that is true, but this is written this way only for educational purposes. That's not what processor would do in this case.

When you access an element through hash table, a hash value is calculated depending on the key. Many keys may lead to the same hash value. Thus there must be a place where many objects with the same hash values are stored and that place is called a bucket. Because there can be many objects in a bucket all must be search through for the value you are looking for, but still it is much faster solution than if all the values where stored in an array (you would have to traverse through all its elements).

回答4:

That is essentially how array access works, it is quite fast. Hash tables are no faster than arrays; in fact it is because they are close to as fast as arrays that they are considered very quick. The key advantage of hash tables is that you can use any hashable type as the key, not just an integer. In addition, they support sparse data without a bunch of wasted array space in between.

回答5:

they are no more efficient for access than an array would be.

That's not saying much, because arrays are blindingly fast. Indexing an array (that is, going from one object to a random other object in that array) is O(1) - a single addition operation. Most processors even have dedicated instructions for indexing into arrays and subobjects in various forms that can do even better.

The processor does not step through every address on the way- it jumps over them, regardless of how many there are. "As efficient as array access" is high praise indeed.

来源：https://stackoverflow.com/questions/26898068/what-happens-at-a-hardware-level-when-i-access-an-element-of-an-array

标签

c++

arrays

hashtable