问题
I am looking for a gcc-supported C++ language extension to enable the allocation of non-zero-based array pointers. Ideally I could simply write:
#include<iostream>
using namespace std;
// Allocate elements array[lo..hi-1], and return the new array.
template<typename Elem>
Elem* Create_Array(int lo, int hi)
{
return new Elem[hi-lo] - lo;
// FIXME what about [expr.add]/4.
// How do we create a pointer outside the array bounds?
}
// Deallocate an array previously allocated via Create_Array.
template<typename Elem>
void Destroy_Array(Elem* array, int lo, int hi)
{
delete[](array + lo);
}
int main()
{
const int LO = 1000000000;
const int HI = LO + 10;
int* array = Create_Array<int>(LO, HI);
for (int i=LO; i<HI; i++)
array[i] = i;
for (int i=LO; i<HI; i++)
cout << array[i] << "\n";
Destroy_Array(array, LO, HI);
}
The above code seems to work, but is not defined by the C++ standard. Specifically, the issue is [expr.add]/4:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.
In other words, behavior is undefined for the line marked FIXME in the code above, because it calculates a pointer that is outside the range x[0..n]
for the 0-based array x
.
Is there some --std=...
option to gcc
to tell it to allow non-zero-based array pointers to be directly calculated?
If not, is there a reasonably portable way to emulate the return new Type[hi-lo] - lo;
statement, perhaps by casting to long
and back? (but then I would worry about introducing more bugs)
Furthermore, can this be done in a way that requires only 1 register to keep track of each array, like the code above? For example if I have array1[i], array2[i], array3[i]
this requires only the 3 registers for the array pointers array1, array2, array3
, plus one register for i
? (similarly, if cold-fetching the array references, we should be able to just fetch the non-zero-based pointer directly, without doing calculations merely to establish the reference in registers)
回答1:
Assuming you're using gcc on linux x86-64, it supports the intptr_t
and uintptr_t
types which can hold any pointer value (valid or not) and also support integer arithmetic. uintptr_t
is more suitable in this application because it supports mod 2^64 semantics while intptr_t
has UB cases.
As suggested in comments, we can use this to build a class that overloads operator[]
and performs range checking:
#include <iostream>
#include <assert.h>
#include <sstream> // for ostringstream
#include <vector> // out_of_range
#include <cstdint> // uintptr_t
using namespace std;
// Safe non-zero-based array. Includes bounds checking.
template<typename Elem>
class Array {
uintptr_t array; // base value for non-zero-based access
int lo; // lowest valid index
int hi; // highest valid index plus 1
public:
Array(int lo, int hi)
: array(), lo(lo), hi(hi)
{
if (lo > hi)
{
ostringstream msg; msg<<"Array(): lo("<<lo<<") > hi("<<hi<< ")";
throw range_error(msg.str());
}
static_assert(sizeof(uintptr_t) == sizeof(void*),
"Array: uintptr_t size does not match ptr size");
static_assert(sizeof(ptrdiff_t) == sizeof(uintptr_t),
"Array: ptrdiff_t size does not match ptr (efficieny issue)");
Elem* alloc = new Elem[hi-lo];
assert(alloc); // this is redundant; alloc throws bad_alloc
array = (uintptr_t)(alloc) - (uintptr_t)(lo * sizeof(Elem));
// Convert offset to unsigned to avoid overflow UB.
}
//////////////////////////////////////////////////////////////////
// UNCHECKED access utilities (these method names start with "_").
uintptr_t _get_array(){return array;}
// Provide direct access to the base pointer (be careful!)
Elem& _at(ptrdiff_t i)
{return *(Elem*)(array + (uintptr_t)(i * sizeof(Elem)));}
// Return reference to element (no bounds checking)
// On GCC 5.4.0 with -O3, this compiles to an 'lea' instruction
Elem* _get_alloc(){return &_at(lo);}
// Return zero-based array that was allocated
~Array() {delete[](_get_alloc());}
//////////////////////////////
// SAFE access utilities
Elem& at(ptrdiff_t i)
{
if (i < lo || i >= hi)
{
ostringstream msg;
msg << "Array.at(): " << i << " is not in range ["
<< lo << ", " << hi << "]";
throw out_of_range(msg.str());
}
return _at(i);
}
int get_lo() const {return lo;}
int get_hi() const {return hi;}
int size() const {return hi - lo;}
Elem& operator[](ptrdiff_t i){return at(i);}
// std::vector is wrong; operator[] is the typical use and should be safe.
// It's good practice to fix mistakes as we go along.
};
// Test
int main()
{
const int LO = 1000000000;
const int HI = LO + 10;
Array<int> array(LO, HI);
for (int i=LO; i<HI; i++)
array[i] = i;
for (int i=LO; i<HI; i++)
cout << array[i] << "\n";
}
Note that it is still not possible to cast the invalid "pointer" calculated by intptr_t
to a pointer type, due to GCC 4.7 Arrays and Pointers:
When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.
This is why the array
field must be of type intptr_t
and not Elem*
. In other words, behavior is defined so long as the intptr_t
is adjusted to point back to the original object before converting back to Elem*
.
来源:https://stackoverflow.com/questions/54951999/c-gcc-extension-for-non-zero-based-array-pointer-allocation