intel

Why does this GLSL shader work fine with a GeForce but flickers strangely on an Intel HD 4000?

天涯浪子 提交于 2019-12-10 23:49:18
问题 Using OpenGL 3.3 core profile, I'm rendering a full-screen "quad" (as a single oversized triangle) via gl.DrawArrays(gl.TRIANGLES, 0, 3) with the following shaders. Vertex shader: #version 330 core #line 1 vec4 vx_Quad_gl_Position () { const float extent = 3; const vec2 pos[3] = vec2[](vec2(-1, -1), vec2(extent, -1), vec2(-1, extent)); return vec4(pos[gl_VertexID], 0, 1); } void main () { gl_Position = vx_Quad_gl_Position(); } Fragment shader: #version 330 core #line 1 out vec3 out_Color;

What level of the cache does PREFETCHT2 fetch into?

旧街凉风 提交于 2019-12-10 23:32:35
问题 The documnation for PREFETCHT2, which is prefetch with T2 hint, says (emphasis mine): T0 (temporal data)—prefetch data into all levels of the cache hierarchy. T1 (temporal data with respect to first level cache misses)—prefetch data into level 2 cache and higher. T2 (temporal data with respect to second level cache misses)—prefetch data into level 3 cache and higher, or an implementation-specific choice. NTA (non-temporal data with respect to all cache levels)—prefetch data into non-temporal

What is an effective address?

百般思念 提交于 2019-12-10 22:43:46
问题 While reading the Intel 64 and IA-32 Architectures Software Developer’s Manual, the operation section for the LEA instruction (load effective address) uses a calculation called EffectiveAddress(SRC) which is not defined anywhere else. What is the definition of effective address and what does EffectiveAddress(SRC) do? 回答1: Section 3.7.5 (Specifying an Offset) of the same document states: The offset part of a memory address can be specified directly as a static value (called a displacement) or

Why wasn't MASKMOVDQU extended to 256-bit and 512-bit stores?

 ̄綄美尐妖づ 提交于 2019-12-10 22:32:15
问题 The MASKMOVDQU 1 is special among x86 store instructions because, in principle, it allows you to store individual bytes in a cache line, without first loading the entire cache line all the way to the core so that the written bytes can be merged with the not-overwritten existing bytes. It would seem to works using the same mechanisms as an NT store: pushing the cache line down without first doing an RFO. Per the Intel software develope manual (emphasis mine): The MASKMOVQ instruction can be

What is the gcc cpu-type that includes support for RDTSCP?

ε祈祈猫儿з 提交于 2019-12-10 20:17:22
问题 I am using RDTSCP to replace LFENCE;RDTSC sequences and also get the processor ID back so that I know when I'm comparing TSC values after the thread was rescheduled to another CPU. To ensure I don't run RDTSCP on a too old machine I fallback to RDTSC after a CPUID check (using libcpuid). I'd like to try using the gcc multiple target attribute functionality instead of a CPUID call: int core2_func (void) __attribute__ ((__target__ ("arch=core2"))); The gcc manual lists a number of cpu families

How to test AVX-512 instructions w/o supported hardware? [closed]

♀尐吖头ヾ 提交于 2019-12-10 20:13:14
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I'm trying to learn x86-64 's new AVX-512 instructions, but neither of my computers have support for them. I tried using various disassemblers (from Visual Studio to online ones: 1, 2) to see the instructions for specific opcode encodings, but I'm getting somewhat conflicting results. Plus, it would've been nice

What happens to a Startup IPI sent to an Active AP that is not in a Wait-for-SIPI state

别说谁变了你拦得住时间么 提交于 2019-12-10 19:24:07
问题 In a previous Stackoverflow answer Margaret Bloom says: Waking the APs This is achieved by inssuing a INIT-SIPI-SIPI (ISS) sequence to the all the APs. The BSP that will send the ISS sequence using as destination the shorthand All excluding self, thereby targeting all the APs. A SIPI (Startup Inter Processor Interrupt) is ignored by all the CPUs that are waked by the time they receive it, thus the second SIPI is ignored if the first one suffices to wake up the target processors. It is advised

Xcode Intel compiler icc cannot find #include <algorithm>

馋奶兔 提交于 2019-12-10 18:43:56
问题 Hi I'm trying to compile a gcc based code on Xcode with the icc compiler (11.1.088) but i have the following error: catastrophic error: could not open source file "algorithm" After looking to this file, it is located in the gcc include directory, but i get hundreds of errors... Does anyone have suggestions ? Thanks. 回答1: What do you have set as your base SDK ? And what version of Xcode ? FWIW I just tried a test with Xcode 3.2.3 and ICC 11.1 (under OS X 10.6 of course) - created a new C++

Installing Intel's TBB 3.0 framework on MacOS 10.6 (Snow Leopard)

风流意气都作罢 提交于 2019-12-10 18:33:24
问题 I'm having a bit of trouble installing Intel's Threading Building Blocks (TBB) 3.0 as a framework on my MacOS system. Does anyone know a good tutorial? I've tried using MacPorts, which has TBB 2.2: it installs all the libraries I need, but I don't get a framework. Also, there doesn't seem to have any .dmg installation file on Intel's site that could provide this framework. All the download files are zipped files containing the src code or the binaries. Any ideas? Thanks! 回答1: There is a

Why is 1's complement still used for encoding vector instructions?

瘦欲@ 提交于 2019-12-10 17:53:37
问题 In an answer, jww points out that 1's complement is still used in encoding vector instructions on intel architectures, and Ruslan clarifies that these instructions are being used more as auto-vectorization becomes common. Is there an advantage of 1's complement that causes it to continue to be used in these instructions, or is it simply being used for historical reasons? Quoting jww: From Intel® 64 and IA-32 Architectures Software Developer’s Manual 2A, page 3-8: 3.1.1.8 Description Section