cpu

tensorflow code optimization strategy

最后都变了- 提交于 2019-11-27 20:48:14
问题 Please excuse the broadness of this question. Maybe once I know more perhaps I can ask more specifically. I have performance sensitive piece of tensorflow code. From the perspective of someone who knows little about gpu programming, I would like to know what guides or strategies would be a "good place to start" to optimizing my code. (single gpu) Perhaps even a readout of how long was spent on each tensorflow op would be nice... I have a vague understanding that Some operations go faster when

How instructions are differentiated from data?

[亡魂溺海] 提交于 2019-11-27 20:42:13
While reading ARM core document, I got this doubt. How does the CPU differentiate the read data from data bus, whether to execute it as an instruction or as a data that it can operate upon? Refer to the excerpt from the document - "Data enters the processor core through the Data bus. The data may be an instruction to execute or a data item." Thanks in advance for enlightening me! /MS Each opcode will consist of an instruction of N bytes, which then expects the subsequent M bytes to be data (memory pointers etc.). So the CPU uses each opcode to determine how manyof the following bytes are data.

Convert the output of os.cpus() in Node.js to percentage

不羁的心 提交于 2019-11-27 19:51:28
Is there a way to convert the os.cpus() info to percentage? Just like the output of iostat (on the CPU section). My code: var os = require('os'); console.log(os.cpus()); The output: [ { model: 'MacBookAir4,2', speed: 1800, times: { user: 5264280, nice: 0, sys: 4001110, idle: 58703910, irq: 0 } }, { model: 'MacBookAir4,2', speed: 1800, times: { user: 2215030, nice: 0, sys: 1072600, idle: 64657440, irq: 0 } }, { model: 'MacBookAir4,2', speed: 1800, times: { user: 5973360, nice: 0, sys: 3197990, idle: 58773760, irq: 0 } }, { model: 'MacBookAir4,2', speed: 1800, times: { user: 2187650, nice: 0,

AES-NI intrinsics enabled by default?

拥有回忆 提交于 2019-11-27 19:40:24
Oracle has this to say about Java 8 with regards to AES-NI: Hardware intrinsics were added to use Advanced Encryption Standard (AES). The UseAES and UseAESIntrinsics flags are available to enable the hardware-based AES intrinsics for Intel hardware. The hardware must be 2010 or newer Westmere hardware. For example, to enable hardware AES, use the following flags: -XX:+UseAES -XX:+UseAESIntrinsics To disable hardware AES use the following flags: -XX:-UseAES -XX:-UseAESIntrinsics But it does not indicate if AES intrinsics are enabled by default (for processors that support it). So the question

Tensorflow: executing an ops with a specific core of a CPU

自作多情 提交于 2019-11-27 19:32:12
It is currently possible to specify which CPU or GPU to use with the tf.device(...) function for specific ops, but is there anyway where you can specify a core of a CPU? There's no API for pinning ops to a particular core at present, though this would make a good feature request . You could approximate this functionality by creating multiple CPU devices, each with a single-threaded threadpool, but this isn't guaranteed to maintain the locality of a core-pinning solution: with tf.device("/cpu:4"): # ... with tf.device("/cpu:7"): # ... with tf.device("/cpu:0"): # ... config = tf.ConfigProto

Why is a CPU branch instruction slow?

纵然是瞬间 提交于 2019-11-27 18:45:49
Since I started programming, I have read in every place to avoid wasteful branches at all costs. That's fine, although none of the articles explained why I should do this. What exactly happens when the CPU decodes a branch instruction and decides to do a jump? And what is the "thing" that makes it slower than other instructions (like addition)? A branch instruction is not inherently slower than any other instruction. However, the reason you heard that branches should avoided is because modern CPUs follow a pipeline architecture . This means that there are multiple sequential instructions being

CPUID implementations in C++

不打扰是莪最后的温柔 提交于 2019-11-27 18:23:17
I would like to know if somebody around here has some good examples of a C++ CPUID implementation that can be referenced from any of the managed .net languages. Also, should this not be the case, should I be aware of certain implementation differences between X86 and X64? I would like to use CPUID to get info on the machine my software is running on (crashreporting etc...) and I want to keep everything as widely compatible as possible. Primary reason I ask is because I am a total noob when it comes to writing what will probably be all machine instructions though I have basic knowledge about

User CPU time vs System CPU time?

烂漫一生 提交于 2019-11-27 17:56:24
Could you explain more about "user CPU time" and "system CPU time"? I have read a lot, but I couldn't understand it well. The difference is whether the time is spent in user space or kernel space. User CPU time is time spent on the processor running your program's code (or code in libraries); system CPU time is the time spent running code in the operating system kernel on behalf of your program. The term ‘user CPU time’ can be a bit misleading at first. To be clear, the total time (real CPU time) is the combination of the amount of time the CPU spends performing some action for a program and

Threading vs single thread

风格不统一 提交于 2019-11-27 17:37:57
Is it always guaranteed that a multi-threaded application would run faster than a single threaded application? I have two threads that populates data from a data source but different entities (eg: database, from two different tables), seems like single threaded version of the application is running faster than the version with two threads. Why would the reason be? when i look at the performance monitor, both cpu s are very spikey ? is this due to context switching? what are the best practices to jack the CPU and fully utilize it? I hope this is not ambiguous. An analogy might help. You have a

CPU Privilege Rings: Why rings 1 and 2 aren't used?

谁说胖子不能爱 提交于 2019-11-27 16:54:27
A couple of questions regarding the x86 CPU privilege rings: Why aren't rings 1 and 2 used by most operating systems? Is it just to maintain code compatibility with other architectures, or is there a better reason? Are there any operating systems which actually use those rings? Or are they completely unused? As a hobbyist operating system writer, I found that because paging (a major part of the modern protection model) only has a concept of privileged (ring 0,1,2) and unprivileged, the benefit to rings 1 and 2 were diminished greatly. The intent by Intel in having rings 1 and 2 is for the OS