sampling

Profilers Instrumenting Vs Sampling

て烟熏妆下的殇ゞ 提交于 2019-12-03 12:23:45
I am doing a study to between profilers mainly instrumenting and sampling. I have came up with the following info: sampling: stop the execution of program, take PC and thus deduce were the program is instrumenting: add some overhead code to the program so it would increment some pointers to know the program If the above info is wrong correct me. After this I was looking at the time of execution and some said that instrumenting takes more time than sampling! is this correct? if yes why is that? in sampling you have to pay the price of context switching between processes while in the latter your

How to keep a random subset of a stream of data?

感情迁移 提交于 2019-12-03 05:56:49
问题 I have a stream of events flowing through my servers. It is not feasible for me to store all of them, but I would like to periodically be able to process some of them in aggregate. So, I want to keep a subset of the stream that is a random sampling of everything I've seen, but is capped to a max size. So, for each new item, I need an algorithm to decide if I should add it to the stored set, or if I should discard it. If I add it, and I'm already at my limit, I need an algorithm to evict one

Is there an algorithm for weighted reservoir sampling? [closed]

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-03 03:53:09
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 years ago . Is there an algorithm for how to perform reservoir sampling when the points in the data stream have associated weights? 回答1: The algorithm by Pavlos Efraimidis and Paul Spirakis solves exactly this problem. The original paper with complete proofs is published with the title

Profiling a (possibly I/O-bound) process to reduce latency

喜夏-厌秋 提交于 2019-12-03 02:51:52
I want to improve the performance of a specific method inside a larger application. The goal is improving latency (wall-clock time spent in a specific function), not (neccessarily) system load. Requirements: As I expect a lot of the latency to be due to I/O, take into account time spent waiting/blocked (in other words: look at wall clock time instead of CPU time) As the program does much more than the fragment i'm trying to optimize. There needs to be a way to either start/stop profiling programmatically, or to filter the output to only show the time between entering and exiting the function i

Audio samples per second?

微笑、不失礼 提交于 2019-12-03 01:27:09
I am wondering on the relationship between a block of samples and its time equivalent. Given my rough idea so far: Number of samples played per second = total filesize / duration. So say, I have a 1.02MB file and a duration of 12 sec (avg), I will have about 89,300 samples played per second. Is this right? Is there other ways on how to compute this? For example, how can I know how much a byte[1024] array is equivalent to in time? Generally speaking for PCM samples you can divide the total length (in bytes) by the duration (in seconds) to get the number of bytes per second (for WAV files there

Efficiently picking a random element from a chained hash table?

北城以北 提交于 2019-12-02 18:47:42
Just for practice (and not as a homework assignment) I have been trying to solve this problem (CLRS, 3rd edition, exercise 11.2-6): Suppose we have stored n keys in a hash table of size m, with collisions resolved by chaining, and that we know the length of each chain, including the length L of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time O(L * (1 + m/n)). What I thought so far is that the probability of each key being returned is 1/n. If we try to get a random value x between 1 to n, and

Is there an algorithm for weighted reservoir sampling? [closed]

最后都变了- 提交于 2019-12-02 18:11:17
Is there an algorithm for how to perform reservoir sampling when the points in the data stream have associated weights? The algorithm by Pavlos Efraimidis and Paul Spirakis solves exactly this problem. The original paper with complete proofs is published with the title "Weighted random sampling with a reservoir" in Information Processing Letters 2006, but you can find a simple summary here . The algorithm works as follows. First observe that another way to solve the unweighted reservoir sampling is to assign to each element a random id R between 0 and 1 and incrementally (say with a heap) keep

Update values of a matrix variable in tensorflow, advanced indexing

喜欢而已 提交于 2019-12-02 08:08:35
I would like to create a function that for every line of a given data X, is applying the softmax function only for some sampled classes, lets say 2, out of K total classes. In simple python the code seems like that: def softy(X,W, num_samples): N = X.shape[0] K = W.shape[0] S = np.zeros((N,K)) ar_to_sof = np.zeros(num_samples) sampled_ind = np.zeros(num_samples, dtype = int) for line in range(N): for samp in range(num_samples): sampled_ind[samp] = randint(0,K-1) ar_to_sof[samp] = np.dot(X[line],np.transpose(W[sampled_ind[samp]])) ar_to_sof = softmax(ar_to_sof) S[line][sampled_ind] = ar_to_sof

Fast Poisson Disk Sampling [Robert Bridson] in Python

风流意气都作罢 提交于 2019-12-02 06:49:00
First of all, I implemented the ordinary, slow, Poisson Disk Sampling algorithm in the 2D plane and it works just fine. This slow version calculates the distances between all points and checks that the point you wish to place is at least R away from all the others. The fast version by Robert Bridson, available here: https://www.cs.ubc.ca/~rbridson/docs/bridson-siggraph07-poissondisk.pdf , suggests discretizing your 2D plane into quadratic cells with length = R/sqrt(2) since each cell can at most contain a single valid point this way and the number of cells you need to check for distance

Sampling small data frame from a big dataframe

这一生的挚爱 提交于 2019-12-01 20:37:35
I am trying to sample a data frame from a given data frame such that there are enough samples from each of the levels of a variable. This can be achieved by separating the data frame by the levels and sample from each of those . I thought ddply (data-frame to data-frame) would do it for me. Taking a minimal example: set.seed(1) data1 <-data.frame(a=sample(c('B0','B1','B2'),100,replace=TRUE),b=rnorm(100),c=runif(100)) > summary(data1$a) B0 B1 B2 30 32 38 The following commands perform the sampling... When I enter... data2 <- ddply(data1,c('a'),function(x) sample(x,20,replace=FALSE)) I get the