openmp | 易学教程

loop tiling/blocking for large dense matrix multiplication

阅读更多关于 loop tiling/blocking for large dense matrix multiplication

来源： https://stackoverflow.com/questions/15829223/loop-tiling-blocking-for-large-dense-matrix-multiplication

【opencv基础】代码优化加速相关

阅读更多关于【opencv基础】代码优化加速相关

1. 定点化； cv::Mat tmp1 = values * 1000000 ; tmp1.convertTo(tmp1, CV_32SC1); cv::Mat tmp2 = this ->weights * 1000000 ; tmp2.convertTo(tmp2, CV_32SC1); cv::Mat tmp(tmp1.rows, tmp2.cols, CV_64FC1); tmp = tmp1 * tmp2 / 1000000000000.0f ; tmp.convertTo(tmp, CV_32FC1); View Code 2. 使用eigen库运算； 3. 使用加速优化的编译选项; 使用浮点运算；使用neon; 可以使用不同的架构选项、ARM选项等等； 4. 使用多线程；注意互斥锁和信号量； 5. 使用多核并行编程； openmp 参考 1. eigen ; 2. GNU_GCC ; 3. openmp_MSDN ; 4. openmp_example_smallpt ; 5. openmp_org ; 完来源： oschina 链接： https://my.oschina.net/u/4274413/blog/4309081

（a5asd6e87c4前缀随便打打应该比较少人能搜到）HPC学习（二）

阅读更多关于（a5asd6e87c4前缀随便打打应该比较少人能搜到）HPC学习（二）

文章目录 OMP学习 introduction 环境配置 OpenMP API 编译器指令库函数互斥锁数据依赖与冲突原子操作与锁原子操作理解原子操作与锁的区别 OpenMP实操（矩阵乘法） OMP学习 motivation：刚接触高性能运算，学习方向有点乱，偶然在网上找到一篇博客: ASC18华农队长超算竞赛完整备战指南.，决定按照这个思路进行整理学习。本篇博客主要用于本人学习记录，如有错误，欢迎各位大佬指出。（大佬勿喷…）这篇继续按照指南顺序，学习openmp编程。学omp推荐b站网课: 新竹清华大学并行计算与并行编程课程，挑选着看，感觉不需要太多基础都可以看懂。 introduction OpenMP = open specification for multi-processing OpenMP是由一组计算机硬件和软件供应商联合定义的应用程序接口（API）; OpenMP为基于共享内存的并行程序的开发人员提供了一种便携式和可扩展的编程模型，其API支持各种架构上的C/C++和Fortran; omp的并行模型称为fork-join模型由主线程创造出多个线程（fork过程），并行代码执行完后，只剩下一个主线程（join过程）环境配置在Windows下很简单，vs本身就支持omp，只需要在项目属性页上左侧选择“配置属性”——“C/C++”——“语言”

MPI基础知识介绍

阅读更多关于 MPI基础知识介绍

MPI（信息传递接口） MPI是一个跨语言的通讯协议，用于编写并行计算机。支持点对点和广播。MPI是一个信息传递应用程序接口，包括协议和和语义说明，他们指明其如何在各种实现中发挥其特性。MPI的目标是高性能，大规模性，和可移植性。MPI在今天仍为高性能计算的主要模型。主要的MPI-1模型不包括共享内存概念，MPI-2只有有限的分布共享内存概念。但是MPI程序经常在共享内存的机器上运行。在MPI模型周边设计程序比在NUMA架构下设计要好因为MPI鼓励内存本地化。尽管MPI属于OSI参考模型的第五层或者更高，他的实现可能通过传输层的sockets和Transmission Control Protocol (TCP)覆盖大部分的层。大部分的MPI实现由一些指定惯例集（API）组成，可由C,C++,Fortran,或者有此类库的语言比如C#, Java or Python直接调用。MPI优于老式信息传递库是因为他的可移植性和速度。与OpenMP并行程序不同，MPI是一种基于信息传递的并行编程技术。消息传递接口是一种编程接口标准，而不是一种具体的编程语言。简而言之，MPI标准定义了一组具有可移植性的编程接口 [1] 。来源： oschina 链接： https://my.oschina.net/u/4409965/blog/4303059

OpenMP GPU offloading math library?

阅读更多关于 OpenMP GPU offloading math library?

问题 I am trying to offload code the GPU using OpenMP 4+ directives. I am using ubuntu 16.04 with GCC 7.2 and for general cases it is working fine. My problem comes when I am trying to offload a code that has a call to the sqrtf function that is defined in "math.h". The troubeling code is this: #pragma omp target teams distribute \ map(to:posx[:n],posy[:n],posz[:n]) \ map(from:frcx[:n],frcy[:n],frcz[:n]) for (int i = 0; i < n; i++) { frcx[i] = 0.0f; frcy[i] = 0.0f; frcz[i] = 0.0f; for (int j = 0;

Optimizing Numeric Program with SIMD

阅读更多关于 Optimizing Numeric Program with SIMD

问题 I am try to optimizing the performance of the following naive program without changing the algorithm : naive (int n, const int *a, const int *b, int *c) //a,b are two array with given size n; { for (int k = 0; k < n; k++) for (int i = 0; i < n - k; ++i) c[k] += a[i + k] * b[i]; } My idea is as follows : First, I use OpenMP for the outer loop. For the inner loop, as it is imbalanced, I specify n-k to determine whether to use AXV2 SIMD intrinsic or simply reduce . And finally, I find that it

C++ thread-safe uniform distribution random number generation

阅读更多关于 C++ thread-safe uniform distribution random number generation

问题 I have a loop. Inside the loop, in each iteration, I need to draw a number from U[0,1]. How can I use openmp, and also make sure that the random number generating process is not contaminated? I got suggestion that I need a thread-safe random number generator, which may or may not be the solution to my problem. My question is very related to another one, with a slight difference that I want to draw from a coninuum U[0,1]. Additionally, I don't know how to seed generator by thread, can someone

Using C++11 thread_local with other parallel libraries

阅读更多关于 Using C++11 thread_local with other parallel libraries

问题 I have a simple question, can C++11 thread_local be used with other parallel models. For example, can I use it within a function while using OpenMP or Intel TBB to parallel the tasks. Most such parallel programming models hide hardware threads behind higher level API. My instinct is that they all have to map their task schedulers into hardware threads. Can I expect that C++11 thread_local will have expected effect. A simple example is, void func () { static thread_local some_var = init_val;

OpenMP with Game of Life visualization using SFML

阅读更多关于 OpenMP with Game of Life visualization using SFML

问题 Hello I'm trying to compare speeds between serial and parallel version of 'Game of Life'. I used SFML library to visualize game of life like this. SFML window Serial logic is simple like below. for (int i = 0; i < height; i++) { for (int j = 0; j < width; j++) { int neighbor = 0; // check 8 cells around. // 1 2 3 -1 // 4 5 0 // 6 7 8 +1 // (1) if (gamefieldSerial.isAvailableCell(UP(i), LEFT(j))) { if(gamefieldSerial[UP(i)][LEFT(j)] == LIVE) neighbor++; } // (2) if (gamefieldSerial

Qt Creator, Compiler in kit for project is being ignored

阅读更多关于 Qt Creator, Compiler in kit for project is being ignored

问题 I am running macOS High Sierra (10.13.2) and Qt 5.10.0. I would like to use OpenMP with my application. I have added the following flags to my .pro file QMAKE_CXXFLAGS += -fopenmp QMAKE_LFLAGS += -fopenmp LIBS += -fopenmp The default compilers on macOS do not contain OpenMP. I installed gcc through homebrew which does support OpenMP. Under the Build & Run -> Compilers tab of Qt Creator, I added the homebrew g++ and gcc compilers ( /usr/local/Cellar/gcc/7.2.0/bin/{gcc-7,g++-7} ). I then