nvidia

Visual Studio - filter nvcc warnings away

梦想与她 提交于 2019-12-02 17:04:24
问题 I'm writing a CUDA program but I'm getting the obnoxious warning: Warning: Cannot tell what pointer points to, assuming global memory space this is coming from nvcc and I can't disable it. Is there any way to filter out warning from third-party tools (like nvcc)? I'm asking for a way to filter out of the output window log errors/warnings coming from custom build tools. 回答1: I had the same annoying warnings, I found help on this thread: link. You can either remove the -G flag on the nvcc

Difference between nVidia Quadro and Geforce cards? [closed]

╄→гoц情女王★ 提交于 2019-12-02 16:08:58
I'm not a 3D or HPC guy, but I've been tasked with doing some research into those fields for a possible HPC application. Reading benchmarks, comparisons and specs between nVidia Quadro and Geforce cards, it seems that for similar generation cards: Quadro is 2x-3x the price of Geforce hardware wise, the differences are not that great in benchmarks (3ds Max, Maya and some others) Quadro cards are much better performing than Geforce ones Does anyone know what are the exact and precise technical differences that can cause such better performance? My speculation (and what can be generally read on

NVIDIA vs AMD: GPGPU performance

喜夏-厌秋 提交于 2019-12-02 15:45:24
I'd like to hear from people with experience of coding for both. Myself, I only have experience with NVIDIA. NVIDIA CUDA seems to be a lot more popular than the competition. (Just counting question tags on this forum, 'cuda' outperforms 'opencl' 3:1, and 'nvidia' outperforms 'ati' 15:1, and there's no tag for 'ati-stream' at all). On the other hand, according to Wikipedia, ATI/AMD cards should have a lot more potential, especially per dollar. The fastest NVIDIA card on the market as of today, GeForce 580 ($500), is rated at 1.6 single-precision TFlops. AMD Radeon 6970 can be had for $370 and

Cuda program not working for more than 1024 threads

会有一股神秘感。 提交于 2019-12-02 14:53:26
问题 My program is of Odd-even merge sort and it's not working for more than 1024 threads. I have already tried increasing the block size to 100 but it still not working for more than 1024 threads. I'm using Visual Studio 2012 and I have Nvidia Geforce 610M . This is my program #include<stdio.h> #include<iostream> #include<conio.h> #include <random> #include <stdint.h> #include <driver_types.h > __global__ void odd(int *arr,int n){ int i=threadIdx.x; int temp; if(i%2==1&&i<n-1){ if(arr[i]>arr[i+1]

NVIDIA NVML Driver/library version mismatch

北城余情 提交于 2019-12-02 13:53:28
When I run nvidia-smi I get the following message: Failed to initialize NVML: Driver/library version mismatch An hour ago I received the same message and uninstalled my cuda library and I was able to run nvidia-smi , getting the following result: After this I downloaded cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb from the official NVIDIA page and then simply: sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb sudo apt-get update sudo apt-get install cuda export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} Now I have cuda installed, but I get the mentioned mismatch

Cuda program not working for more than 1024 threads

无人久伴 提交于 2019-12-02 12:35:01
My program is of Odd-even merge sort and it's not working for more than 1024 threads. I have already tried increasing the block size to 100 but it still not working for more than 1024 threads. I'm using Visual Studio 2012 and I have Nvidia Geforce 610M . This is my program #include<stdio.h> #include<iostream> #include<conio.h> #include <random> #include <stdint.h> #include <driver_types.h > __global__ void odd(int *arr,int n){ int i=threadIdx.x; int temp; if(i%2==1&&i<n-1){ if(arr[i]>arr[i+1]) { temp=arr[i]; arr[i]=arr[i+1]; arr[i+1]=temp; } } } __global__ void even(int *arr,int n){ int i

SYCL exception caught: Error: [ComputeCpp:RT0101] Failed to create kernel ((Kernel Name: SYCL_class_multiply))

删除回忆录丶 提交于 2019-12-02 12:16:33
问题 I cloned https://github.com/codeplaysoftware/computecpp-sdk.git and modified the computecpp-sdk/samples/accessors/accessors.cpp file. I just added std::cout << "SYCL exception caught: " << e.get_cl_code() << '\n'; . See the fully modified code : /*************************************************************************** * * Copyright (C) 2016 Codeplay Software Limited * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the

Unable to execute device kernel in CUDA

十年热恋 提交于 2019-12-02 09:53:06
I am trying to call a device kernel within a global kernel. My global kernel is a Matrix Multiplication and my device kernel is finding the maximum value and the index in each column of the product matrix. Following is the code : __device__ void MaxFunction(float* Pd, float* max) { int x = (threadIdx.x + blockIdx.x * blockDim.x); int y = (threadIdx.y + blockIdx.y * blockDim.y); int k = 0; int temp = 0; int temp_idx = 0; for (k = 0; k < wB; ++k) { if(Pd[x*wB + y] > temp){ temp = Pd[x*wB + y]; temp_idx = x*wB + y; } max[y*2 + 0] = temp; max[y*2 + 1] = temp_idx; } } __global__ void

CUDA: Addition of two numbers giving wrong answer

断了今生、忘了曾经 提交于 2019-12-02 08:35:10
问题 Here is the program #include <stdio.h> #include <cuda.h> #include <cuda_runtime.h> #include <device_launch_parameters.h> __global__ void Addition(int *a,int *b,int *c) { *c = *a + *b; } int main() { int a,b,c; int *dev_a,*dev_b,*dev_c; int size = sizeof(int); cudaMalloc((void**)&dev_a, size); cudaMalloc((void**)&dev_b, size); cudaMalloc((void**)&dev_c, size); a=5,b=6; cudaMemcpy(dev_a, &a,sizeof(int), cudaMemcpyHostToDevice); cudaMemcpy(dev_b, &b,sizeof(int), cudaMemcpyHostToDevice); Addition

CUDA - Creating objects in kernel and using them at host [duplicate]

ぃ、小莉子 提交于 2019-12-02 08:34:13
This question already has an answer here: How to copy the memory allocated in device function back to main memory 1 answer I need to use polymorphism in my kernels. The only way of doing this is to create those objects on the device (to make a virtual mehod table available at the device). Here's the object being created class Production { Vertex * boundVertex; } class Vertex { Vertex * leftChild; Vertex * rightChild; } Then on the host I do: Production* dProd; cudaMalloc(&dProd, sizeof(Production *)); createProduction<<<1,1>>>(dProd); where __global__ void createProduction(Production * prod) {