performance | 易学教程

Why is my for loop code slower than an iterator?

阅读更多关于 Why is my for loop code slower than an iterator?

问题 I am trying to solve the leetcode problem distribute-candies. It is easy, just find out the minimum between the candies' kinds and candies half number. Here's my solution (cost 48ms): use std::collections::HashSet; pub fn distribute_candies(candies: Vec<i32>) -> i32 { let sister_candies = (candies.len() / 2) as i32; let mut kind = 0; let mut candies_kinds = HashSet::new(); for candy in candies.into_iter() { if candies_kinds.insert(candy) { kind += 1; if kind > sister_candies { return sister

Why is vectorized numpy code slower than for loops?

阅读更多关于 Why is vectorized numpy code slower than for loops?

问题 I have two numpy arrays, X and Y , with shapes (n,d) and (m,d) , respectively. Assume that we want to compute the Euclidean distances between each row of X and each row of Y and store the result in array Z with shape (n,m) . I have two implementations for this. The first implementation uses two for loops as follows: for i in range(n): for j in range(m): Z[i,j] = np.sqrt(np.sum(np.square(X[i] - Y[j]))) The second implementation uses only one loop by vectorization: for i in range(n): Z[i] = np

Why is vectorized numpy code slower than for loops?

阅读更多关于 Why is vectorized numpy code slower than for loops?

Performance optimisations of x86-64 assembly - Alignment and branch prediction

阅读更多关于 Performance optimisations of x86-64 assembly - Alignment and branch prediction

问题 I’m currently coding highly optimised versions of some C99 standard library string functions, like strlen() , memset() , etc, using x86-64 assembly with SSE-2 instructions. So far I’ve managed to get excellent results in terms of performance, but I sometimes get weird behaviour when I try to optimise more. For instance, adding or even removing some simple instructions, or simply reorganising some local labels used with jumps completely degrades the overall performances. And there’s absolutely

Is there a SQL server performance counter for average execution time?

阅读更多关于 Is there a SQL server performance counter for average execution time?

问题 I want to tune a production SQL server. After making adjustments (such as changing the degree of parallelism) I want to know if it helped or hurt query execution times. This seems like an obvious performance counter, but for the last half hour I've been searching Google and the counter list in perfmon, and I have not been able to find a performance counter for SQL server to give me the average execution time for all queries hitting a server. The SQL Server equivalent of the ASP.NET Request

Are memory orderings: consume, acq_rel and seq_cst ever needed on Intel x86?

阅读更多关于 Are memory orderings: consume, acq_rel and seq_cst ever needed on Intel x86?

问题 C++11 specifies six memory orderings: typedef enum memory_order { memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst } memory_order; https://en.cppreference.com/w/cpp/atomic/memory_order where the default is seq_cst. Performance gains can be found by relaxing the memory ordering of operations. However, this depends on what protections the architecture provides. For example, Intel x86 is a strong memory model and

overloaded array subscript [] operator slow

阅读更多关于 overloaded array subscript [] operator slow

问题 I have written my own Array class in c++ and overloaded the array subscript [] operator, code: inline dtype &operator[](const size_t i) { return _data[i]; } inline dtype operator[](const size_t i) const { return _data[i];} where _data is a pointer to the memory block containing the array. Profiling shows that this overloaded operator alone is taking about 10% of the overall computation time (on a long monte carlo simulation, and I am compiling using g++ with maximum optimization). This seems

Dictionary with tuple key slower than nested dictionary. Why?

阅读更多关于 Dictionary with tuple key slower than nested dictionary. Why?

问题 I've tested the speed of retrieving, updating and removing values in a dictionary using a (int, int, string) tuple as key versus the same thing with a nested Dictionary: Dictionary>>. My tests show the tuple dictionary to be a lot slower (58% for retrieving, 69% for updating and 200% for removing). I did not expect that. The nested dictionary needs to do more lookups, so why is the tuple dictionary that much slower? My test code: public static object TupleDic_RemoveValue(object[] param) { var

C++ signed and unsigned int vs long long speed

阅读更多关于 C++ signed and unsigned int vs long long speed

问题 Today, I noticed that the speed of several simple bitwise and arithmetic operations differs significantly between int , unsigned , long long and unsigned long long on my 64-bit pc. In particular, the following loop is about twice as fast for unsigned as for long long , which I didn't expect. int k = 15; int N = 30; int mask = (1 << k) - 1; while (!(mask & 1 << N)) { int lo = mask & ~(mask - 1); int lz = (mask + lo) & ~mask; mask |= lz; mask &= ~(lz - 1); mask |= (lz / lo / 2) - 1; } (full

C++ signed and unsigned int vs long long speed

阅读更多关于 C++ signed and unsigned int vs long long speed