intel

Android Emulation Issue; Application working on ARM system image but not Intel Atom system image

青春壹個敷衍的年華 提交于 2020-01-15 12:24:25
问题 Being frustrated with the speed of the android emulator running on a ARM image, as soon as I discovered the intel atom image I used it right away. There was a immediate speed boost (boot going from 30 sec to 3 sec) but my application no longer works completely. All the XML portions of the app work, but as soon as I move to my activity which is all surfaceview and canvas work the screen goes black. It does not crash and there are no errors in logcat. The stranger thing is that I have a

AMD CPU versus Intel CPU openCL

纵饮孤独 提交于 2020-01-14 19:14:41
问题 With some friends we want to use openCL. For this we look to buy a new computer, but we asked us the best between AMD and Intel for use of openCL. The graphics card will be a Nvidia and we don't have choice on the graphic card, so we start to want buy an intel cpu, but after some research we figure out that may be AMD cpu are better with openCL. We didn't find benchmarks which compare the both. So here is our questions: Is AMD better than Intel with openCL? Is it a matter to have a Nvidia

Python execution speed: laptop vs desktop

余生长醉 提交于 2020-01-14 13:15:09
问题 I am running a program that does simple data processing: parses text populates dictionaries calculates some functions over the resulting data The program only uses CPU, RAM, and HDD: run from Windows command line input/output to the local hard drive nothing displayed on or printed to screen no networking The same program is run on: desktop: Windows 7, i7-930 CPU overclocked @3.6 GHz (with matching memory speed), Intel X-25M SSD laptop: Windows XP, Intel Core2 Duo T9300 @2.5GHz, 7200 rpm HDD

Python execution speed: laptop vs desktop

梦想的初衷 提交于 2020-01-14 13:14:54
问题 I am running a program that does simple data processing: parses text populates dictionaries calculates some functions over the resulting data The program only uses CPU, RAM, and HDD: run from Windows command line input/output to the local hard drive nothing displayed on or printed to screen no networking The same program is run on: desktop: Windows 7, i7-930 CPU overclocked @3.6 GHz (with matching memory speed), Intel X-25M SSD laptop: Windows XP, Intel Core2 Duo T9300 @2.5GHz, 7200 rpm HDD

High performance implement of atomic minimal operation

泄露秘密 提交于 2020-01-14 06:01:30
问题 There is no atomic minimal operation in OpenMP, also no intrinsic in Intel MIC's instruction set. #pragmma omp critial is very insufficient in the performance. I want to know if there is a high performance implement of atomic minimal for Intel MIC. 回答1: According to the OpenMP 4.0 Specifications (Section 2.12.6), there is a lot of fast atomic minimal operations you can do by using the #pragma omp atomic construct in place of #pragma omp critical (and thereby avoid the huge overhead of its

Store forwarding Address vs Data: What the difference between STD and STA in the Intel Optimization guide?

旧巷老猫 提交于 2020-01-13 09:24:17
问题 I'm wondering if any Intel experts out there can tell me the difference between STD and STA with respect to the Intel Skylake core. In the Intel optimization guide, there's a picture describing the "super-scalar ports" of the Intel Cores. Here's the PDF. The picture is on page 40. . Here's another picture from page 78, this picture describes "Store Address" and "Store Data": Prepares the store forwarding and store retirement logic with the address of the data being stored. Prepares the store

Bypass delays when switching execution unit domains

◇◆丶佛笑我妖孽 提交于 2020-01-12 19:01:12
问题 I'm trying to understand possibly bypass delays when switching domains of execution units. For example, the following two lines of code give exactly the same result. _mm_add_ps(x, _mm_castsi128_ps(_mm_slli_si128(_mm_castps_si128(x), 8))); _mm_add_ps(x, _mm_shuffle_ps(_mm_setzero_ps(), x, 0x40)); Which line of code is better to use? The assembly output for the first line gives: vpslldq xmm1, xmm0, 8 vaddps xmm0, xmm1, xmm0 The assembly output for the second line gives: vshufps xmm1, xmm0,

Bypass delays when switching execution unit domains

六眼飞鱼酱① 提交于 2020-01-12 19:00:08
问题 I'm trying to understand possibly bypass delays when switching domains of execution units. For example, the following two lines of code give exactly the same result. _mm_add_ps(x, _mm_castsi128_ps(_mm_slli_si128(_mm_castps_si128(x), 8))); _mm_add_ps(x, _mm_shuffle_ps(_mm_setzero_ps(), x, 0x40)); Which line of code is better to use? The assembly output for the first line gives: vpslldq xmm1, xmm0, 8 vaddps xmm0, xmm1, xmm0 The assembly output for the second line gives: vshufps xmm1, xmm0,

Intel Fortran Composer 2011 and Linux Mint 12

[亡魂溺海] 提交于 2020-01-12 11:09:29
问题 I'm using Intel Fortran Composer 2011 on a Linux Mint 12 system. Every time (and for every user) I restart the computer I need to set the environment variables. source /opt/intel/composer_xe_2011_sp1.9.293/bin/compilervars.sh intel64 Is there any way to make it automatic for all users? Sorry my poor english. Thanks, CP 回答1: Put a file under /etc/profile.d with the following content (e.g name it intel.sh ) #!/bin/sh source /opt/intel/composer_xe_2011_sp1.9.293/bin/compilervars.sh intel64 来源:

Why is prefetch speedup not greater in this example?

我怕爱的太早我们不能终老 提交于 2020-01-12 08:37:27
问题 In 6.3.2 of this this excellent paper Ulrich Drepper writes about software prefetching. He says this is the "familiar pointer chasing framework" which I gather is the test he gives earlier about traversing randomized pointers. It makes sense in his graph that performance tails off when the working set exceeds the cache size, because then we are going to main memory more and more often. But why does prefetch help only 8% here? If we are telling the processor exactly what we want to load, and