latency | 易学教程

How to prefetch data using a custom python function in tensorflow

阅读更多关于 How to prefetch data using a custom python function in tensorflow

问题 I am trying to prefetch training data to hide I/O latency. I would like to write custom Python code that loads data from disk and preprocesses the data (e.g. by adding a context window). In other words, one thread does data preprocessing and the other does training. Is this possible in TensorFlow? Update: I have a working example based on @mrry\'s example. import numpy as np import tensorflow as tf import threading BATCH_SIZE = 5 TRAINING_ITERS = 4100 feature_input = tf.placeholder(tf.float32

What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?

阅读更多关于 What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?

I want to be able to predict, by hand, exactly how long arbitrary arithmetical (i.e. no branching or memory, though that would be nice too) x86-64 assembly code will take given a particular architecture, taking into account instruction reordering, superscalarity, latencies, CPIs, etc. What / describe the rules must be followed to achieve this? I think I've got some preliminary rules figured out, but I haven't been able to find any references on breaking down any example code to this level of detail, so I've had to take some guesses. (For example, the Intel optimization manual barely even

fastest (low latency) method for Inter Process Communication between Java and C/C++

阅读更多关于 fastest (low latency) method for Inter Process Communication between Java and C/C++

问题 I have a Java app, connecting through TCP socket to a \"server\" developed in C/C++. both app & server are running on the same machine, a Solaris box (but we\'re considering migrating to Linux eventually). type of data exchanged is simple messages (login, login ACK, then client asks for something, server replies). each message is around 300 bytes long. Currently we\'re using Sockets, and all is OK, however I\'m looking for a faster way to exchange data (lower latency), using IPC methods. I\

asynchronous IO io_submit latency in Ubuntu Linux

阅读更多关于 asynchronous IO io_submit latency in Ubuntu Linux

问题 I am looking for advice on how to get efficient and high performance asynchronous IO working for my application that runs on Ubuntu Linux 14.04. My app processes transactions and creates a file on disk/flash. As the app is progressing through transactions additional blocks are created that must be appended to the file on disk/flash. The app needs also to frequently read blocks of this file as it is processing new transactions. Each transaction might need to read a different block from this

How does LMAX's disruptor pattern work?

阅读更多关于 How does LMAX's disruptor pattern work?

问题 I am trying to understand the disruptor pattern. I have watched the InfoQ video and tried to read their paper. I understand there is a ring buffer involved, that it is initialized as an extremely large array to take advantage of cache locality, eliminate allocation of new memory. It sounds like there are one or more atomic integers which keep track of positions. Each \'event\' seems to get a unique id and it\'s position in the ring is found by finding its modulus with respect to the size of

How do I simulate a low bandwidth, high latency environment?

阅读更多关于 How do I simulate a low bandwidth, high latency environment?

问题 I need to simulate a low bandwidth, high latency connection to a server in order to emulate the conditions of a VPN at a remote site. The bandwidth and latency should be tweakable so I can discover the best combination in order to run our software package. 回答1: For macOS , there is the Network Link Conditioner that simulates configurable bandwidth, latency, and packet loss. It is contained in the Additional Tools for Xcode package. 回答2: There's an excellent writeup of setting up a FreeBSD

What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?

阅读更多关于 What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?

问题 I want to be able to predict, by hand, exactly how long arbitrary arithmetical (i.e. no branching or memory, though that would be nice too) x86-64 assembly code will take given a particular architecture, taking into account instruction reordering, superscalarity, latencies, CPIs, etc. What / describe the rules must be followed to achieve this? I think I\'ve got some preliminary rules figured out, but I haven\'t been able to find any references on breaking down any example code to this level

Approximate cost to access various caches and main memory?

阅读更多关于 Approximate cost to access various caches and main memory?

问题 Can anyone give me the approximate time (in nanoseconds) to access L1, L2 and L3 caches, as well as main memory on Intel i7 processors? While this isn\'t specifically a programming question, knowing these kinds of speed details is neccessary for some low-latency programming challenges. 回答1: Here is a Performance Analysis Guide for the i7 and Xeon range of processors. I should stress, this has what you need and more (for example, check page 22 for some timings & cycles for example).

Loading and displaying large text files

阅读更多关于 Loading and displaying large text files

问题 In a Swing application, I sometimes need to support read-only access to large, line-oriented text files that are slow to load: logs, dumps, traces, etc. For small amounts of data, a suitable Document and JTextComponent are fine, as shown here. I understand the human limitations of browsing large amounts of data, but the problematic stuff seems like it\'s always in the biggest file. Is there any practical alternative for larger amounts of text in the 10-100 megabyte, million-line range? 回答1: I