Assigning Rcpp objects into an Rcpp List yields duplicates of the last element

前端 未结 1 1186
刺人心
刺人心 2021-01-14 12:53

I am trying to take a Rcpp::CharacterMatrix and convert each row to its own element within an Rcpp::List.

However, the function that I hav

相关标签:
1条回答
  • 2021-01-14 13:26

    What's happening here is largely the result of how Rcpp objects work. In particular, CharacterVector acts as a pointer to a memory location. By defining this memory location outside the for loop, the result is a "global" pointer. That is, when an update to B occurs in the loop this subsequently updates all variants of B that have been conveniently stored in the Rcpp::List. Hence, the repeated lines of "c" throughout the list.

    With this being said, it is a very, very, very bad idea to use .push_back() on any Rcpp data types as you will end up copying to and fro the ever expanding object. Copying will occur as Rcpp data types hide the underlying SEXP that controls the R object, which must be recreated. As a result, you should try one of the following approaches:

    • Rearrange where the Rcpp::CharacterVector is created to be inside the first for loop and preallocate Rcpp::List space.
    • Switch to using only C++ standard library objects and convert at the end to the appropriate type.
      • std::list with std::vector<T> type T (i.e. std::string)
      • Rcpp::wrap(x) to return the correct object or modify the function return type from Rcpp::List to std::list<std::vector<T> >.
    • Preallocate Rcpp::List space and use std::vector<T> type T (i.e. std::string).
    • Preallocate Rcpp::List space and make a clone() of the Rcpp object before storing it in the list.

    Option 1

    Here we rearrange the function by moving the declaration for B into the first loop, preallocate the list space, and access the output list normally.

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    Rcpp::List char_expand_list_rearrange(Rcpp::CharacterMatrix A) {
      Rcpp::List output(A.nrow());
    
      for(int i = 0; i < A.nrow(); i++) {
        Rcpp::CharacterVector B(A.ncol());
    
        for(int j = 0; j < A.ncol(); j++) {
          B[j] = A(i, j);
        }
    
        output[i] = B;
      }
    
      return output;
    }
    

    Option 2

    Here we removed Rcpp::CharacterVector in favor of std::vector<std::string> and substituted Rcpp::List for std::list<std::vector<std::string> >. At the end, we convert the standard object to an Rcpp::List via Rcpp::wrap().

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    Rcpp::List char_expand_std_to_list(Rcpp::CharacterMatrix A) {
      std::vector<std::string> B(A.ncol());
    
      std::list<std::vector<std::string> > o;
    
      for(int i = 0 ;i < A.nrow(); i++) {
        for(int j = 0; j < A.ncol(); j++) {
          B[j] = A(i, j);
        }
    
        o.push_back(B);
      }
    
      return Rcpp::wrap(o);
    }
    

    Giving:

    mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
    char_expand_std_to_list(mat)
    # [[1]]
    # [1] "a" "a" "a"
    #
    # [[2]]
    # [1] "b" "b" "b"
    #
    # [[3]]
    # [1] "c" "c" "c"
    

    Option 3

    Alternatively, you could aim to keep the Rcpp::List, but just declare the size it is expecting ahead of time and still use a std::vector<T> element.

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    Rcpp::List char_expand_list_vec(Rcpp::CharacterMatrix A) {
      std::vector<std::string> B(A.ncol());
    
      Rcpp::List o(A.nrow());
    
      for(int i = 0; i < A.nrow(); i++) {
        for(int j = 0; j < A.ncol(); j++) {
          B[j] = A(i, j);
        }
    
        o[i] = B;
      }
    
      return o;
    }
    

    Option 4

    Lastly, with space predefined for a list, there is an explicit clone of the data at each iteration.

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    Rcpp::List char_expand_list_clone(Rcpp::CharacterMatrix A) {
      Rcpp::CharacterVector B(A.ncol());
      Rcpp::List output(A.nrow());
    
      for(int i = 0; i < A.nrow(); i++) {
    
        for(int j = 0; j < A.ncol(); j++) {
          B[j] = A(i, j);
        }
    
        output[i] = clone(B);
      }
    
      return output;
    }
    

    Benchmark

    The benchmark results show that Option 1 with a rearrangement and preallocation of space performs the best. The runner-up second is Option 4, which involves cloning each vector before saving it into the Rcpp::List.

    library("microbenchmark")
    library("ggplot2")
    
    mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
    
    micro_mat_to_list = 
      microbenchmark(char_expand_list_rearrange(mat),
                     char_expand_std_to_list(mat),
                     char_expand_list_vec(mat),
                     char_expand_list_clone(mat))
    micro_mat_to_list
    # Unit: microseconds
    #                             expr   min     lq    mean median     uq    max neval
    #  char_expand_list_rearrange(mat) 1.501 1.9255 3.22054 2.1965 4.8445  6.797   100
    #     char_expand_std_to_list(mat) 2.869 3.2035 4.90108 3.7740 6.4415 27.627   100
    #        char_expand_list_vec(mat) 1.948 2.2335 3.83939 2.7130 5.2585 24.814   100
    #      char_expand_list_clone(mat) 1.562 1.9225 3.60184 2.2370 4.8435 33.965   100
    

    0 讨论(0)
提交回复
热议问题