c++17

What is the optimum OpenCL 2 kernel to sum floats?

空扰寡人 提交于 2019-12-24 03:32:36
问题 C++ 17 introduced a number of new algorithms to support parallel execution, in particular std::reduce is a parallel version of std::accumulate which permits non-deterministic behaviour for non-commutative operations, such as floating point addition. I want to implement a reduce algorithm using OpenCL 2. Intel have an example here which uses OpenCL 2 work group kernel functions to implement a std::exclusive_scan OpenCL 2 kernel. Below is kernel to sum floats, based on Intel's exclusive_scan

The C++ detection idiom is not working as expected in Visual Studio 2015 Update 2 CTP

爱⌒轻易说出口 提交于 2019-12-24 00:14:38
问题 I'm playing with the proposal of standard library support for the C++ detection idiom and compiled the following code with the Microsoft C/C++ Optimizing Compiler Version 19.00.23725 for x64 : #include <iostream> template<class...> using void_t = void; template<class, template<class> class, class = void_t<>> struct detect : std::false_type { }; template<class T, template<class> class Operation> struct detect<T, Operation, void_t<Operation<T>>> : std::true_type { }; template<class T> using bar

Why std::optional::value() &&; return &&?

喜你入骨 提交于 2019-12-23 22:25:35
问题 I have had runtime error, when replaced some code by using std::optional: Old code: T getValue(); ... const auto& value = getValue(); value.get(); New code: std::optional<T> getValue(); ... const auto& value = getValue().value(); value.get(); // Runtime error, crash It was unpredictable for me. The reason of crash is that the method returns T&& . My question is in what cases T&& can be useful, why the method does not return a T . Complete code: #include <experimental/optional> #include

Parallel algorithm to sum-assign the elements of a vector to the elements of another one

时间秒杀一切 提交于 2019-12-23 20:49:32
问题 Consider: std::vector<double> u, v; #pragma omp parallel for for (std::size_t i = 0u; i < u.size(); ++i) u[i] += v[i]; To express similar code with the C++17 parallel algorithms, the solution I found so far is to use the two input ranges version of std::transform : std::transform(std::execution::par_unseq, std::begin(u), std::end(u), std::begin(v), std::begin(u), std::plus()) which I don't like at all because it bypasses the += operator of my types and in my real use case leads to much more

Why aren't parts of the Concurrency TS going in C++17?

余生长醉 提交于 2019-12-23 19:36:52
问题 According to Michael Wong the Concurrency TS is not going in, despite complete, apparently Although there is implementation experience, it was just approved and is too fresh to be voted to be added to C++17. My favourite proposal was originally in N3327 but I first read of it in N3857/N3784 and had expected it for C++14. futures has had and implementation of .then() in Boost since 2013 and Microsoft has implemented a form of them in PPL so any issues will have been hit upon, discussed and

Unpack parameter pack into string view

 ̄綄美尐妖づ 提交于 2019-12-23 18:04:05
问题 It is possible to unpack a value template parameter pack of type char into a (compile time) string. How does one acquire a string_view into that string? What I want to do: int main() { constexpr auto s = stringify<'a', 'b', 'c'>(); constexpr std::string_view sv{ s.begin(), s.size() }; return 0; } Try: template<char ... chars> constexpr auto stringify() { std::array<char, sizeof...(chars)> array = { chars... }; return array; } Error: 15 : <source>:15:30: error: constexpr variable 'sv' must be

c++ parallel std::sort for floating values

≡放荡痞女 提交于 2019-12-23 17:14:29
问题 I've a large file consisting of > millions of floating point values. I can easily sort them using std::sort by reading file into vector for now, eg - std::vector<float> v; std::sort(v.begin(), v.end()); but is there any version of std::sort or similar algorithm which takes advantage of multiple cores available on my system? Since this is the only task that takes much time setting up, I'm looking for perf improvements from having > 1 core cpu. I can use any latest releases of compilers on a

Conjuction template doesn't short circuit

家住魔仙堡 提交于 2019-12-23 16:40:06
问题 I want to be able to evaluate whether a function accepts one argument of type int, and whether it returns void. To that end I used std::conjunction since I believed it was supposed to short-circuit and not evaluate the second ill-formed expression in case the function is not callable with one argument of type int, but for some reason I get a compiler error: #include <iostream> #include <type_traits> template<typename Function> struct oneArgVoid { static constexpr bool value = std::conjunction

How to deduce contiguous memory from iterator

偶尔善良 提交于 2019-12-23 16:30:22
问题 Somehow, the native stl::copy() algorithm on VC++ (Dinkumware) figures out that it can use memcpy() on data that is trivially copy-able. Is it possible for a mere mortal to do that? - assuming each element is_trivially_copyable. Does random_access_iterator imply contiguous memory? The standard is not clear to me. So, if all you have in a template is an iterator or two, is it possible to deduce at compile-time that the underlying array can be copied with memcpy() , and if so how? EDIT - Here's

Standard C++ transactional memory status

只谈情不闲聊 提交于 2019-12-23 13:27:30
问题 What is the current status of transactional memory proposal for C++17. Is it going to be included in the standard, aimed at being included in some future version of standard C++ or is only an experimental proof-of-concept feature, with its standardization status still undetermined? I'm asking because some of the standardization committee's documents seem to give contradictory communication here. On the one hand we have P0265R0 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0265r0