opencl | 易学教程

How to obtain OpenCL SDK?

阅读更多关于 How to obtain OpenCL SDK?

I was perusing http://www.khronos.org/ web site and only found headers for OpenCL (not OpenGL which I don't care about). How can I obtain OpenCL SDK? AMD's ATI Stream SDK works perfectly for me and it uses multicore cpu's. I have an Intel CPU and an NVIDIA card but it works with using the CPU. Just registration is required and no special selection like Nvidia requires: http://developer.amd.com/GPU/ATISTREAMSDKBETAPROGRAM/Pages/default.aspx I got it to work in ubuntu 9.04. Just download the installation instruction PDF:s also available on that page and it should work. There isn't a Khoronos

What future does the GPU have in computing? [closed]

阅读更多关于 What future does the GPU have in computing? [closed]

Your CPU may be a quad-core, but did you know that some graphics cards today have over 200 cores? We've already seen what GPU's in today's graphics cards can do when it comes to graphics. Now they can be used for non-graphical tasks as well, and in my opinion the results are nothing short of amazing. An algorithm that lends itself well to parallelism has the potential to be much, much faster on a GPU than it could ever be on a CPU. There are a few technologies that make all of this possible: 1.) CUDA by NVidia. It seems to be the most well-known and well-documented. Unfortunately, it'll only

Why aren't there bank conflicts in global memory for Cuda/OpenCL?

阅读更多关于 Why aren't there bank conflicts in global memory for Cuda/OpenCL?

One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers? UPDATE Wow I really appreciate the two answers from Tibbit and Grizzly. It seems that I can only give a green check mark to one answer though. I am newish to stack overflow. I guess I have to pick one answer as the best. Can I do something to say thank you to the answer I don't give a green check to? Short Answer: There are no bank conflicts in either global memory or in registers. Explanation: The

NVIDIA vs AMD: GPGPU performance

阅读更多关于 NVIDIA vs AMD: GPGPU performance

I'd like to hear from people with experience of coding for both. Myself, I only have experience with NVIDIA. NVIDIA CUDA seems to be a lot more popular than the competition. (Just counting question tags on this forum, 'cuda' outperforms 'opencl' 3:1, and 'nvidia' outperforms 'ati' 15:1, and there's no tag for 'ati-stream' at all). On the other hand, according to Wikipedia, ATI/AMD cards should have a lot more potential, especially per dollar. The fastest NVIDIA card on the market as of today, GeForce 580 ($500), is rated at 1.6 single-precision TFlops. AMD Radeon 6970 can be had for $370 and

OpenCL vs OpenMP performance [closed]

阅读更多关于 OpenCL vs OpenMP performance [closed]

Have there been any studies comparing OpenCL to OpenMP performance? Specifically I am interested in the overhead cost of launching threads with OpenCL, e.g., if one were to decompose the domain into a very large number of individual work items (each run by a thread doing a small job) versus heavier weight threads in OpenMP were the domain was decomposed into sub domains whose number equals the number of cores. It seems that the OpenCL programming model is more targeted towards massively parallel chips (GPUs, for instance), rather than CPUs that have fewer but more powerful cores. Can OpenCL be

Reduction with OpenMP: linear merging or log(number of threads) merging

阅读更多关于 Reduction with OpenMP: linear merging or log(number of threads) merging

问题 I have a general question about reductions with OpenMP that's bothered me for a while. My question is in regards to merging the partial sums in a reduction. It can either be done linearly or as the log of the number of threads. Let's assume I want to do a reduction of some function double foo(int i) . With OpenMP I could do it like this. double sum = 0.0; #pragma omp parallel for reduction (+:sum) for(int i=0; i<n; i++) { sum += f(i); } However, I claim that the following code will be just as

OpenGL vs. OpenCL, which to choose and why?

阅读更多关于 OpenGL vs. OpenCL, which to choose and why?

What features make OpenCL unique to choose over OpenGL with GLSL for calculations? Despite the graphic related terminology and inpractical datatypes, is there any real caveat to OpenGL? For example, parallel function evaluation can be done by rendering a to a texture using other textures. Reducing operations can be done by iteratively render to smaller and smaller textures. On the other hand, random write access is not possible in any efficient manner (the only way to do is rendering triangles by texture driven vertex data). Is this possible with OpenCL? What else is possible not possible with

OpenCL - copy Tree to device memory

阅读更多关于 OpenCL - copy Tree to device memory

I'm implemented a Binary-Search-Tree in C code. Each of my tree nodes looks like this: typedef struct treeNode { int key; struct treeNode *right; struct treeNode *left; } treeNode_t; The construction of the Tree made by the Host. The query of the tree made by the device. Now, let's assumed that I'm already finished building my Tree in host memory. I'm want to copy the root of my tree to the memory of my device. Copying the root of the tree it self isn't enough. Because the right \ left child isn't located in the device memory. This is a problem. So, my question is what is the easiest way to

SYCL exception caught: Error: [ComputeCpp:RT0101] Failed to create kernel ((Kernel Name: SYCL_class_multiply))

阅读更多关于 SYCL exception caught: Error: [ComputeCpp:RT0101] Failed to create kernel ((Kernel Name: SYCL_class_multiply))

问题 I cloned https://github.com/codeplaysoftware/computecpp-sdk.git and modified the computecpp-sdk/samples/accessors/accessors.cpp file. I just added std::cout << "SYCL exception caught: " << e.get_cl_code() << '\n'; . See the fully modified code : /*************************************************************************** * * Copyright (C) 2016 Codeplay Software Limited * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the

OpenCL: using struct as kernel argument

阅读更多关于 OpenCL: using struct as kernel argument

问题 Can I use struct as OpenCL kernel argument? I want to use struct type as OpenCL kernel argument in NVIDIA OpenCL 1.2 (NVIDIA driver 352.39) I tried, but it makes CL_OUT_OF_RESOURCE error. What is wrong in my code?? [for struct definition] /* struct type definition */ typedef struct _st_foo { int aaa; int bbb; ..... int zzz; }st_foo; // st_foo doesn't have any pointer members [Host code] /* OpenCL initalize... */ st_foo stVar; cl_mem cm_buffer; cm_buffer = clCreateBuffer(cxContext, CL_MEM_READ