optimization

Improve PNG optimization Gulp task

夙愿已清 提交于 2020-01-01 05:18:13
问题 This is source PNG with transparency: http://i.imgur.com/7m0zIBp.png (13.3kB) optimized using compresspng.com: http://i.imgur.com/DHUiLuO.png (5.4kB) optimized using tinypng.com: http://i.imgur.com/rEE2hzg.png (5.6kB) optimized with gulp-imagemin+imagemin-pngquant: http://i.imgur.com/OTqI6lK.png (6.6kB) As you can see online tools are better than Gulp. Is there a way to improve PNG optimization with Gulp? Just in case, here's my gulp task: gulp.task('images', function() { return gulp.src(

How to enable optimization in G++ with #pragma

妖精的绣舞 提交于 2020-01-01 05:08:50
问题 I want to enable optimization in g++ without command line parameter. I know GCC can do it by writing #pragma GCC optimize (2) in my code. But it seems won't work in G++. This page may help: http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html My compiler version: $ g++ --version g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 <suppressed copyright message> $ gcc --version gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 <suppressed copyright message> I worte some code like this: #pragma

Bilinear filter with SSE4.1 intrinsics

孤街浪徒 提交于 2020-01-01 05:03:11
问题 I am trying to figure out a reasonably fast bilinear filtering function just for one filtered sample at a time now as an exercise in getting used to using intrinsics - up to SSE41 is fine. So far I have the following: inline __m128i DivideBy255_8xUint16(const __m128i value) { // Blinn 16bit divide by 255 trick but across 8 packed 16bit values const __m128i plus128 = _mm_add_epi16(value, _mm_set1_epi16(128)); const __m128i plus128ThenDivideBy256 = _mm_srli_epi16(plus128, 8); // TODO: Should

Bilinear filter with SSE4.1 intrinsics

大憨熊 提交于 2020-01-01 05:03:11
问题 I am trying to figure out a reasonably fast bilinear filtering function just for one filtered sample at a time now as an exercise in getting used to using intrinsics - up to SSE41 is fine. So far I have the following: inline __m128i DivideBy255_8xUint16(const __m128i value) { // Blinn 16bit divide by 255 trick but across 8 packed 16bit values const __m128i plus128 = _mm_add_epi16(value, _mm_set1_epi16(128)); const __m128i plus128ThenDivideBy256 = _mm_srli_epi16(plus128, 8); // TODO: Should

C++ stack and scope

倖福魔咒の 提交于 2020-01-01 04:52:06
问题 I tried this code on Visual C++ 2008 and it shows that A and B don't have the same address. int main() { { int A; printf("%p\n", &A); } int B; printf("%p\n", &B); } But since A doesn't exist anymore when B gets defined, it seems to me that the same stack location could be reused... I don't understand why the compiler doesn't seem to do what looks like a very simple optimization (which could matter in the context of larger variables and recursive functions for example). And it doesn't seem

memcpy vs assignment in C — should be memmove?

大城市里の小女人 提交于 2020-01-01 04:41:07
问题 As pointed out in an answer to this question, the compiler (in this case gcc-4.1.2, yes it's old, no I can't change it) can replace struct assignments with memcpy where it thinks it is appropriate. I'm running some code under valgrind and got a warning about memcpy source/destination overlap. When I look at the code, I see this (paraphrasing): struct outer { struct inner i; // lots of other stuff }; struct inner { int x; // lots of other stuff }; void frob(struct inner* i, struct outer* o) {

Can GCC optimize things better when I compile everything in one step?

瘦欲@ 提交于 2020-01-01 04:26:10
问题 gcc optimizes code when I pass it the -O2 flag, but I'm wondering how well it can actually do that if I compile all source files to object files and then link them afterwards. Here's an example: // in a.h int foo(int n); // in foo.cpp int foo(int n) { return n; } // in main.cpp #include "a.h" int main(void) { return foo(5); } // code used to compile it all gcc -c -O2 foo.cpp -o foo.o gcc -c -O2 main.cpp -o main.o gcc -O2 foo.o main.o -o executable Normally, gcc should inline foo because it's

EASTL versus STL, how can there be such a performance difference in std::vector<uint64_t>::operator[]

☆樱花仙子☆ 提交于 2020-01-01 04:18:05
问题 According to http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html vector<uint64>::operator[] is between 2% and 70% faster in EASTL than a "commonly used commercial version of STL". Unless the commercial version of STL uses range checking, which would make the comparison unfair, how can it possibly be such a speed difference for such a simple operation? Update: Seems the answer is that the EA engineers is simply cheating by comparing with a version which uses range checking...

Expression templates are not being inlined fully

浪子不回头ぞ 提交于 2020-01-01 04:12:10
问题 I have the first version of a math library completed, and for the next step I'd like to turn to expression templates to improve the performance of the code. However, my initial results are different than I expected. I am compiling in MSVC 2010, in vanilla Release mode (and am okay with C++0x). Apologies in advance for the large amount of code I'll be showing you, it's as minimal as I can make it while letting people look at what I'm doing. Profiling framework: #include <algorithm> #include

Why does the JVM show more latency for the same block of code after a busy spin pause?

久未见 提交于 2020-01-01 02:38:45
问题 The code below demonstrates the problem unequivocally , which is: The exact same block of code becomes slower after a busy spin pause. Note that of course I'm not using Thread.sleep . Also note that there are no conditionals leading to a HotSpot/JIT de-optimization as I'm changing the pause using a math operation, not an IF . There is a block of math operations that I want to time. First I time the block pausing 1 nanosecond before I start my measurement. I do that 20,000 times. Then I change