optimization

MATLAB fast (componentwise) vector operations are…really fast

前提是你 提交于 2021-02-19 02:53:01
问题 I am writing MATLAB scripts since some time and, still, I do not understand how it works "under the hood". Consider the following script, that do some computation using (big) vectors in three different ways: MATLAB vector operations; Simple for cycle that do the same computation component-wise; An optimized cycle that is supposed to be faster than 2. since avoid some allocation and some assignment. Here is the code: N = 10000000; A = linspace(0,100,N); B = linspace(-100,100,N); C = linspace(0

Recursion, memoization and mutable default arguments in Python

人走茶凉 提交于 2021-02-19 02:48:52
问题 "Base" meaning without just using lru_cache. All of these are "fast enough" -- I'm not looking for the fastest algorithm -- but the timings surprised me so I was hoping I could learn something about how Python "works". Simple loop (/tail recursion): def fibonacci(n): a, b = 0, 1 if n in (a, b): return n for _ in range(n - 1): a, b = b, a + b return b Simple memoized: def fibonacci(n, memo={0:0, 1:1}): if len(memo) <= n: memo[n] = fibonacci(n - 1) + fibonacci(n - 2) return memo[n] Using a

optimizing array loop in c

孤街醉人 提交于 2021-02-19 02:38:19
问题 I have looked online and in my books but I can't seem to get this. I was asked to optimize a small part of a program. Specifically to take an array and add its contents within a small amount of time, with vi and gcc, without using the built-in optimizer. I have tried loop unrolling and a couple of other optimizations meant for products. Can you please help? int length = ARRAY_SIZE; int limit = length-4; for (j=0; j < limit; j+=5) { sum += array[j] + array[j+1] + array[j+2] + array[j+3] +

How can I improve this square root method?

£可爱£侵袭症+ 提交于 2021-02-19 02:09:25
问题 I know this sounds like a homework assignment, but it isn't. Lately I've been interested in algorithms used to perform certain mathematical operations, such as sine, square root, etc. At the moment, I'm trying to write the Babylonian method of computing square roots in C#. So far, I have this: public static double SquareRoot(double x) { if (x == 0) return 0; double r = x / 2; // this is inefficient, but I can't find a better way // to get a close estimate for the starting value of r double

Will a good C++ compiler optimize a reference away?

家住魔仙堡 提交于 2021-02-18 20:08:48
问题 I want to write a template function that does something with a std::stack<T> and an instance of T , e.g.: template<class StackType> inline bool some_func( StackType const &s, typename StackType::value_type const &v ) { // ... } The reason I pass v by reference is of course to optimize for the case where StackType::value_type is a struct or class and not copy an entire object by value. However, if StackType::value_type is a "simple" type like int , then it's of course better simply to pass it

Will a good C++ compiler optimize a reference away?

梦想的初衷 提交于 2021-02-18 20:06:36
问题 I want to write a template function that does something with a std::stack<T> and an instance of T , e.g.: template<class StackType> inline bool some_func( StackType const &s, typename StackType::value_type const &v ) { // ... } The reason I pass v by reference is of course to optimize for the case where StackType::value_type is a struct or class and not copy an entire object by value. However, if StackType::value_type is a "simple" type like int , then it's of course better simply to pass it

Optimisation tips to find in which triangle a point belongs

和自甴很熟 提交于 2021-02-18 17:49:03
问题 I'm actually having some troubles optimising my algorithm: I have a disk (centered in 0, with radius 1) filled with triangles (not necessarily of same area/length). There could be a HUGE amount of triangle (let's say from 1k to 300k triangles) My goal is to find as quick as possible in which triangle a point belongs. The operation has to be repeated a large amount of time (around 10k times ). For now the algorithm I'm using is: I'm computing the barycentric coordinates of the point in each

run `perf stat` on the output of `perf record`?

江枫思渺然 提交于 2021-02-18 16:58:30
问题 With perf (the Linux profiler), (v4.15.18), I can run perf stat $COMMAND to get some simple stats on the command. If I run perf record , it saves lots of data to a perf.data file. Can I run perf stat on the output of perf record ? So that I can look at the perf recorded data, but also get a simple overview? 回答1: perf stat uses hardware performance monitoring unit in counting mode, and perf record / perf report with perf.data file uses the same unit in overflow mode. In both modes hardware

run `perf stat` on the output of `perf record`?

不羁的心 提交于 2021-02-18 16:58:05
问题 With perf (the Linux profiler), (v4.15.18), I can run perf stat $COMMAND to get some simple stats on the command. If I run perf record , it saves lots of data to a perf.data file. Can I run perf stat on the output of perf record ? So that I can look at the perf recorded data, but also get a simple overview? 回答1: perf stat uses hardware performance monitoring unit in counting mode, and perf record / perf report with perf.data file uses the same unit in overflow mode. In both modes hardware

Optimization techniques used by std::regex_constants::optimize

六月ゝ 毕业季﹏ 提交于 2021-02-18 10:58:43
问题 I am working with std::regex , and whilst reading about the various constants defined in std::regex_constants , I came across std::optimize , reading about it, it sounds like it is useful in my application (I only need one instance of the regex, initialized at the beginning, but it is used multiple times throughout the loading process). According to the working paper n3126 (pg. 1077), std::regex_constants::optimize : Specifies that the regular expression engine should pay more attention to