latency

How to prefetch data using a custom python function in tensorflow

戏子无情 提交于 2019-11-26 12:51:33
问题 I am trying to prefetch training data to hide I/O latency. I would like to write custom Python code that loads data from disk and preprocesses the data (e.g. by adding a context window). In other words, one thread does data preprocessing and the other does training. Is this possible in TensorFlow? Update: I have a working example based on @mrry\'s example. import numpy as np import tensorflow as tf import threading BATCH_SIZE = 5 TRAINING_ITERS = 4100 feature_input = tf.placeholder(tf.float32

What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?

六眼飞鱼酱① 提交于 2019-11-26 11:25:40
I want to be able to predict, by hand, exactly how long arbitrary arithmetical (i.e. no branching or memory, though that would be nice too) x86-64 assembly code will take given a particular architecture, taking into account instruction reordering, superscalarity, latencies, CPIs, etc. What / describe the rules must be followed to achieve this? I think I've got some preliminary rules figured out, but I haven't been able to find any references on breaking down any example code to this level of detail, so I've had to take some guesses. (For example, the Intel optimization manual barely even

fastest (low latency) method for Inter Process Communication between Java and C/C++

两盒软妹~` 提交于 2019-11-26 10:05:26
问题 I have a Java app, connecting through TCP socket to a \"server\" developed in C/C++. both app & server are running on the same machine, a Solaris box (but we\'re considering migrating to Linux eventually). type of data exchanged is simple messages (login, login ACK, then client asks for something, server replies). each message is around 300 bytes long. Currently we\'re using Sockets, and all is OK, however I\'m looking for a faster way to exchange data (lower latency), using IPC methods. I\

asynchronous IO io_submit latency in Ubuntu Linux

五迷三道 提交于 2019-11-26 09:49:06
问题 I am looking for advice on how to get efficient and high performance asynchronous IO working for my application that runs on Ubuntu Linux 14.04. My app processes transactions and creates a file on disk/flash. As the app is progressing through transactions additional blocks are created that must be appended to the file on disk/flash. The app needs also to frequently read blocks of this file as it is processing new transactions. Each transaction might need to read a different block from this

How does LMAX's disruptor pattern work?

混江龙づ霸主 提交于 2019-11-26 08:38:44
问题 I am trying to understand the disruptor pattern. I have watched the InfoQ video and tried to read their paper. I understand there is a ring buffer involved, that it is initialized as an extremely large array to take advantage of cache locality, eliminate allocation of new memory. It sounds like there are one or more atomic integers which keep track of positions. Each \'event\' seems to get a unique id and it\'s position in the ring is found by finding its modulus with respect to the size of

How do I simulate a low bandwidth, high latency environment?

限于喜欢 提交于 2019-11-26 06:53:01
问题 I need to simulate a low bandwidth, high latency connection to a server in order to emulate the conditions of a VPN at a remote site. The bandwidth and latency should be tweakable so I can discover the best combination in order to run our software package. 回答1: For macOS , there is the Network Link Conditioner that simulates configurable bandwidth, latency, and packet loss. It is contained in the Additional Tools for Xcode package. 回答2: There's an excellent writeup of setting up a FreeBSD

What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?

依然范特西╮ 提交于 2019-11-26 02:02:14
问题 I want to be able to predict, by hand, exactly how long arbitrary arithmetical (i.e. no branching or memory, though that would be nice too) x86-64 assembly code will take given a particular architecture, taking into account instruction reordering, superscalarity, latencies, CPIs, etc. What / describe the rules must be followed to achieve this? I think I\'ve got some preliminary rules figured out, but I haven\'t been able to find any references on breaking down any example code to this level

Approximate cost to access various caches and main memory?

◇◆丶佛笑我妖孽 提交于 2019-11-26 00:49:30
问题 Can anyone give me the approximate time (in nanoseconds) to access L1, L2 and L3 caches, as well as main memory on Intel i7 processors? While this isn\'t specifically a programming question, knowing these kinds of speed details is neccessary for some low-latency programming challenges. 回答1: Here is a Performance Analysis Guide for the i7 and Xeon range of processors. I should stress, this has what you need and more (for example, check page 22 for some timings & cycles for example).

Loading and displaying large text files

為{幸葍}努か 提交于 2019-11-25 22:08:11
问题 In a Swing application, I sometimes need to support read-only access to large, line-oriented text files that are slow to load: logs, dumps, traces, etc. For small amounts of data, a suitable Document and JTextComponent are fine, as shown here. I understand the human limitations of browsing large amounts of data, but the problematic stuff seems like it\'s always in the biggest file. Is there any practical alternative for larger amounts of text in the 10-100 megabyte, million-line range? 回答1: I