Use std::vector<double> to access data managed by std::unique_ptr<double[2]>

谁说我不能喝 提交于 2021-02-08 05:17:36

问题


I have a complex class, that holds a big block of double[2]-type data managed by a smart pointer like: std::unique_ptr<double[2]> m_data; I cannot change the type of the data structure.

I am using a library that gives me a function with the following signature: bool func_in_lib(std::vector<double>& data, double& res). I cannot change the signature of this function.

I want to pass the data managed by the unique_ptr to the function expecting a vector<double>& without breaking the connection to my complex class. I want the function to work directly on my m_data and not copy the data into a std::vector<double> and the copy it back into my complex class, because I have to do this a lot of times.

Is there any way to do this?


Here is some code that covers the semantic I want to have. The code line of my concern is

vector<double> access_vec = /* give access to my_data via vector interface */;


#include <iostream>
#include <memory>
#include <vector>

using namespace std;

//--------------------------------------------------------------------------//
//--- This function is given, I cannot change its signature.
bool
func_in_lib(std::vector<double>& data, double& res) {
  //--- check some properties of the vector
  if (data.size() < 10)
    return false;
  //--- do something magical with the data
  for (auto& d : data)
    d *= 2.0;
  res = 42.0;
  return true;
}

//--------------------------------------------------------------------------//
struct DataType {
  double a = 1.0;
  double b = 2.0;
  double c = 3.0;
};

//--------------------------------------------------------------------------//
ostream&
operator<<(ostream& out, const DataType& d) {
  out << d.a << " " << d.b << " " << d.c << endl;
  return out;
}

//--------------------------------------------------------------------------//
int
main(int argc, char const* argv[]) {
  int count = 20;
  //--- init and print my data
  unique_ptr<DataType[]> my_data = make_unique<DataType[]>(count);
  for (int i = 0; i < count; ++i)
    cout << my_data.get()[i];
  //---
  double         result     = 0.0;
  vector<double> access_vec = /* give access to my_data via vector interface */;
  func_in_lib(access_vec, result);

  return 0;
}


回答1:


tl;dr: Not possible in a standard-compliant way.

It's actually almost possible, but std::allocator limitations block your way. Let me explain.

  • An std::vector "owns" the memory it uses for element storage: A vector has the right to delete[] the memory (e.g. on destruction, or destruction-after-move, or a .resize(), or a push_back etc.) and reallocate elsewhere. If you want to maintain ownership by your unique_ptr, you can't allow that to happen. And while it's true your mock-implementation of func_in_lib() doesn't do any of that - your code can't make these assumptions because it must cater to the function's declaration, not it's body.

But let's say you're willing to bend the rules a little, and assume that the vector won't replace its allocated memory while running. This is legitimate, in the sense that - if you were able to pass the memory for the vector to use somehow, and it replaced the memory region, you could detect that when func_in_lib() returns, and then either fix things in the unique_ptr or throw an exception (depending on whether other places in your code hold a pointer to the discarded memory). Or - let's suppose that func_in_lib() took a const std::vector<double[2]>& instead of a non-const reference. Our path would still be blocked. Why?

  • std::vector manages memory through an allocator object. The allocator is a template, so in theory you could use a vector where the allocator does whatever you want - for example, starting with pre-allocated memory (which you give it - from unique_ptr::get(), and refusing to ever reallocate any memory, e.g. by throwing an exception. And since one of the std::vector constructors takes an allocator of the appropriate type - you could construct your desired allocator, create a vector with it, and pass a reference to that vector.

But alas - your library is cruel. func_in_lib isn't templated and can only take the default template parameter for its allocator: std::allocator.

  • The default allocator used for std::vector and other standard library containers is std::allocator. Now, allocators are a crooked idea generally, in my opinion; but std::allocator is particularly annoying. Specifically, it can't be constructed using a pre-existing memory region for it to use; it only ever holds memory it has allocated itself - never memory you gave it.

So, you'll never get an std::vector to use the memory you want to.

So what to do?

  1. Option 1: Your hack:

    • Figure out the concrete layout of std::vector on your system
    • Manually set field values to something useful
    • Use reinterpret_cast<std::vector>() on your raw data.
  2. Option 2: malloc() and free() hooks (if you're on a Unix-like system and/or using a compiled which uses libc)

    • See: Using Malloc Hooks

      the idea is to detect the new[] call from the std::vector you create, and give it your own unique_ptr-controlled memory instead of actually allocating anything. And when the vector asks to free the memory (e.g. on destruction), you do nothing.

  3. Switch libraries. The library exposing func_in_lib is poorly written. Unless it is a very niche library, I'm sure there are better alternatives. In fact, peharps you could do better job writing it yourself.

  4. Don't use that particular function in the library; stick to lower-level, simple primitives in the library and implement func_in_lib() using those. Not always feasible, but may be worth a short.




回答2:


With a colleague of mine I found two solutions, that solve my problem.

Solution 1 - The hacky one

The idea is to use the structure of the underlying implementation of the std::vector<double>, which consists in my case of 3 members containing 3 pointers to the data of the vector.

  1. start address of the data section
  2. end address of the data section
  3. address of the current maximum capacity of the data section

So I build a struct containing these three addresses and use a reinterpret_cast to a std::vector. This works with the current implementation of std::vector on my machine. This implementation can vary, depending on the installed version of the STL.

The nice thing here is, that I can use the interface of std::vector without creating it. I also do not have to copy the data into a std::vector. I could also take a just part from the initial data stored in my complex class. I can control the manipulated part, by the pointers I send to the struct.


This solves my problem, but it is a hack. I can use it, because the code is only relevant for myself. I still post it, because it could be of interest for others.

#include <iostream>
#include <memory>
#include <vector>

using namespace std;

//--------------------------------------------------------------------------//
//--- This function is given, I cannot change its signature.
bool
func_in_lib(std::vector<double>& data, double& res) {
  //--- check some properties of the vector
  if (data.size() < 10)
    return false;
  //--- do something magical with the data
  for (auto& d : data)
    d *= 2.0;

  res = 42.0;
  return true;
}

//--------------------------------------------------------------------------//
struct DataType {
  double a = 1.0;
  double b = 2.0;
  double c = 3.0;
};

//--------------------------------------------------------------------------//
ostream&
operator<<(ostream& out, const DataType& d) {
  out << d.a << " " << d.b << " " << d.c << endl;
  return out;
}

//--------------------------------------------------------------------------//
int
main(int argc, char const* argv[]) {
  int count = 20;
  //--- init and print my data
  unique_ptr<DataType[]> my_data = make_unique<DataType[]>(count);
  for (int i = 0; i < count; ++i)
    cout << my_data.get()[i];
  
  //--------------------------------------------------------------------------//
  // HERE STARTS THE UGLY HACK, THAT CAN BE ERROR-PRONE BECAUSE IT DEPENDS ON
  // THE UNDERLYING IMPLEMENTATION OF std::vector<T>
  //--------------------------------------------------------------------------//
  struct VecAccess {
    double* start = nullptr; // address to the start of the data
    double* stop0 = nullptr; // address to the end of the data
    double* stop1 = nullptr; // address to the capacity of the vector
  };

  //---
  DataType*       p_data = my_data.get();
  VecAccess       va{ &(p_data[0].a),                //points at the 'front' of the vector
                      &(p_data[count - 1].c) + 1,    //points at the 'end' of the vector
                      &(p_data[count - 1].c) + 1 };
  vector<double>* p_vec_access = reinterpret_cast<vector<double>*>(&va);
  //--------------------------------------------------------------------------//
  // HERE ENDS THE UGLY HACK.
  //--------------------------------------------------------------------------//

  //---
  double dummy = 0.0;   // this is only relevant for the code used as minimum example
  func_in_lib(*p_vec_access, dummy);

  //--- print the modified data
  for (int i = 0; i < count; ++i)
    cout << my_data.get()[i];

  return 0;
}


Update: Analyzing the assembler code of the second solution shows, that a copy of the content is performed, even though the copy-constructor of the data objects is not called. The copy process happens at machine code level.

Solution 2 - The move semantic

For this solution I have to mark the Move-Constructor of DataType with noexcept. The key idea is not to treat the DataType[] array as a std::vector<double>. Instead we treat the std::vector<double> as a std::vector<DataType>. We can then move the data into this vector (without copying), send it to the function, and move it back afterwards.

The data is not copied but moved std::vector, which is faster. Also relevant for my case I can again take a just part from the initial data stored in my complex class. Drawback with this solution I have to create an additional storage for the moved data with the correct size.

#include <iostream>
#include <memory>
#include <utility>
#include <vector>

using namespace std;

//--------------------------------------------------------------------------//
//--- This function is given, I cannot change its signature.
bool
func_in_lib(std::vector<double>& data, double& res) {
  //--- check some properties of the vector
  if (data.size() < 10)
    return false;
  //--- do something magical with the data
  for (auto& d : data)
    d *= 2.0;

  res = 42.0;
  return true;
}

//--------------------------------------------------------------------------//
class DataType {
public:
  double a = 1.0;
  double b = 2.0;
  double c = 3.0;

  // clang-format off
  DataType() = default;
  DataType(DataType const&) = default;
  DataType(DataType&&) noexcept = default;
  DataType& operator=(DataType const&) = default;
  DataType& operator=(DataType&&) noexcept  = default;
  ~DataType()  = default;
  // clang-format on
};

//--------------------------------------------------------------------------//
ostream&
operator<<(ostream& out, const DataType& d) {
  out << d.a << " " << d.b << " " << d.c << endl;
  return out;
}

//--------------------------------------------------------------------------//
int
main(int argc, char const* argv[]) {
  int count = 20;
  //--- init and print my data
  unique_ptr<DataType[]> my_data = make_unique<DataType[]>(count);
  for (int i = 0; i < count; ++i)
    cout << my_data.get()[i];
  //---
  vector<double> double_vec;
  double_vec.reserve(count * 3);
  //--- here starts the magic stuff
  auto& vec_as_datatype = *reinterpret_cast<vector<DataType>*>(&double_vec);
  auto* start_mv        = &(my_data.get()[0]);
  auto* stop_mv         = &(my_data.get()[count]) + 1;
  //--- move the content to the vec
  move(start_mv, stop_mv, back_inserter(vec_as_datatype));
  //--- call the external func in the lib
  double dummy = 0.0; // is only needed for the code of the example
  func_in_lib(double_vec, dummy);
  //--- move the content to back
  move(begin(vec_as_datatype), end(vec_as_datatype), start_mv);
  //--- print modified the data
  for (int i = 0; i < count; ++i)
    cout << my_data.get()[i];
}



回答3:


This is not a reasonable answer but nobody mentioned ( because it surely does not directly answer your question ) C++17 polymorphic allocator with std::pmr::vector in the sense that they can easily do half of the work.

But unfortunately it is not possible to come back to an usual std::vector

I also came accross an article of Bartek's coding blog from which I stole the code snippet below:

#include <iostream>
#include <memory_resource>   // pmr core types
#include <vector>            // pmr::vector
#include <cctype>

template <typename T> void MyToUpper(T& vec)    {
    for(auto & cr:vec)
        cr = std::toupper(cr);
}

//https://www.bfilipek.com/2020/06/pmr-hacking.html

int main() {
    char buffer[64] = {}; // a small buffer on the stack
    std::fill_n(std::begin(buffer), std::size(buffer) - 1, '_');
    std::cout << buffer << "\n\n";

    std::pmr::monotonic_buffer_resource pool{std::data(buffer), std::size(buffer)};

    std::pmr::vector<char> vec{ &pool };
    for (char ch = 'a'; ch <= 'z'; ++ch)
        vec.push_back(ch);
        
    std::cout << buffer << "\n\n";
    
    MyToUpper(vec);
    
    std::cout << buffer << '\n';
}

with potential result under coliru (note: c++17)

_______________________________________________________________

aababcdabcdefghabcdefghijklmnopabcdefghijklmnopqrstuvwxyz______

aababcdabcdefghabcdefghijklmnopABCDEFGHIJKLMNOPQRSTUVWXYZ______

The article mentioned that the garbage part (aababcdabcdefghabcdefghijklmnop) is due to vector data reallocation while growing.

But what is interesting here is that the operation performed on the vector was indeed done on the original buffer ( abcdefghijklmnopqrstuvwxyz => ABCDEFGHIJKLMNOPQRSTUVWXYZ )

Unfortunately the std::pmr::vector would not fit your function func_in_lib(std::vector<double>& data, double& res)

I think you bought the library and have no access to the code and can not recompile it, but on the contrary you could use templates or maybe just tell your provider to add using std::pmr::vector; at the beginning of its code...



来源:https://stackoverflow.com/questions/62467215/use-stdvectordouble-to-access-data-managed-by-stdunique-ptrdouble2

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!