multiplication

How to multiply two quaternions with minimal instructions?

核能气质少年 提交于 2019-11-29 03:05:41
问题 After some thought, I came up with the following code for multiplying two quaternions using SSE: #include <pmmintrin.h> /* SSE3 intrinsics */ /* multiplication of two quaternions (x, y, z, w) x (a, b, c, d) */ __m128 _mm_cross4_ps(__m128 xyzw, __m128 abcd) { /* The product of two quaternions is: */ /* (X,Y,Z,W) = (xd+yc-zb+wa, -xc+yd+za+wb, xb-ya+zd+wc, -xa-yb-zc+wd) */ __m128 wzyx = _mm_shuffle_ps(xyzw, xyzw, _MM_SHUFFLE(0,1,2,3)); __m128 baba = _mm_shuffle_ps(abcd, abcd, _MM_SHUFFLE(0,1,0,1

What's the best C++ way to multiply unsigned integers modularly safely?

和自甴很熟 提交于 2019-11-28 17:09:41
问题 Let's say that you are using <cstdint> and types like std::uint8_t and std::uint16_t , and want to do operations like += and *= on them. You'd like arithmetic on these numbers to wrap around modularly, like typical in C/C++. This ordinarily works, and you find experimentally works with std::uint8_t , std::uint32_t and std::uint64_t , but not std::uint16_t . Specifically, multiplication with std::uint16_t sometimes fails spectacularly, with optimized builds producing all kinds of weird results

Is integer multiplication really done at the same speed as addition on a modern CPU?

不打扰是莪最后的温柔 提交于 2019-11-28 16:42:21
问题 I hear this statement quite often, that multiplication on modern hardware is so optimized that it actually is at the same speed as addition. Is that true? I never can get any authoritative confirmation. My own research only adds questions. The speed tests usually show data that confuses me. Here is an example: #include <stdio.h> #include <sys/time.h> unsigned int time1000() { timeval val; gettimeofday(&val, 0); val.tv_sec &= 0xffff; return val.tv_sec * 1000 + val.tv_usec / 1000; } int main()

Strassen's algorithm for matrix multiplication

房东的猫 提交于 2019-11-28 16:13:38
Can someone please explain strassen's algorithm for matrix multiplication in an intuitive way? I've gone through (well, tried to go through) the explanation in the book and wiki but it's not clicking upstairs. Any links on the web that use a lot of English rather than formal notation etc. would be helpful, too. Are there any analogies which might help me build this algorithm from scratch without having to memorize it? Consider multiplying two 2x2 matrices, as follows: A B * E F = AE+BG AF+BH C D G H CE+DG CF+DH The obvious way to compute the right side is just to do the 8 multiplies and 4

Multiplication of three numbers in c give a wrong results?

情到浓时终转凉″ 提交于 2019-11-28 14:21:59
问题 I can't belive what's happen in my program double den = 180*3600*10000 ; in debugging a got this value -2109934592.0000000 any help please ??? you can try this simple code #include<stdio.h> #include<math.h> int main ( int argc , char *argv ) { double denominator = 10000*180*3600 ; printf("%f \n", denominator ) ; return 0 ; } 回答1: With the full code in the question we can now see it's an integer overflow. 10000 * 180 * 3600 = 6,480,000,000. This is greater than 2,147,483,648 which is the max

Matlab - Multiplying a matrix with every matrix of a 3d matrix

送分小仙女□ 提交于 2019-11-28 14:11:38
I have two matlab questions that seem closely related. I want to find the most efficient way (no loop?) to multiply a (A x A) matrix with every single matrix of a 3d matrix (A x A x N). Also, I would like to take the trace of each of those products. http://en.wikipedia.org/wiki/Matrix_multiplication#Frobenius_product This is the inner frobenius product. On the crappy code I have below I'm using its secondary definition which is more efficient. I want to multiply each element of a vector (N x 1) with its "corresponding" matrix of a 3d matrix (A x A x N). function Y_returned = problem_1(X_matrix

Translation from Complex-FFT to Finite-Field-FFT

泪湿孤枕 提交于 2019-11-28 12:43:37
Good afternoon! I am trying to develop an NTT algorithm based on the naive recursive FFT implementation I already have. Consider the following code ( coefficients ' length, let it be m , is an exact power of two): /// <summary> /// Calculates the result of the recursive Number Theoretic Transform. /// </summary> /// <param name="coefficients"></param> /// <returns></returns> private static BigInteger[] Recursive_NTT_Skeleton( IList<BigInteger> coefficients, IList<BigInteger> rootsOfUnity, int step, int offset) { // Calculate the length of vectors at the current step of recursion. // - int n =

x86_64: is IMUL faster than 2x SHL + 2x ADD?

喜夏-厌秋 提交于 2019-11-28 12:30:00
When looking at the assembly produced by Visual Studio (2015U2) in /O2 (release) mode I saw that this 'hand-optimized' piece of C code is translated back into a multiplication: int64_t calc(int64_t a) { return (a << 6) + (a << 16) - a; } Assembly: imul rdx,qword ptr [a],1003Fh So I was wondering if that is really faster than doing it the way it is written, something like: mov rbx,qword ptr [a] mov rax,rbx shl rax,6 mov rcx,rbx shl rcx,10h add rax,rcx sub rax,rbx I was always under the impression that multiplication is always slower than a few shifts/adds? Is that no longer the case with modern

Understanding Modified Baugh-Wooley multiplication algorithm

最后都变了- 提交于 2019-11-28 10:45:51
问题 For Modified Baugh-Wooley multiplication algorithm , why is it !(A0*B5) instead of just (A0*B5) ? Same questions for !(A1*B5), !(A2*B5), !(A3*B5), !(A4*B5), !(A5*B4), !(A5*3), !(A5*B2), !(A5*B1) and !(A5*B0) Besides, why there are two extra '1' ? 回答1: In signed 6-bit 2s complement notation, the place values of the bits are: -32 16 8 4 2 1 Notice that the top bit has a negative value. When addition, subtraction, and multiplication are performed mod 64, however, that minus sign makes absolutely

long long vs int multiplication

喜夏-厌秋 提交于 2019-11-28 10:16:29
Given the following snippet: #include <stdio.h> typedef signed long long int64; typedef signed int int32; typedef signed char int8; int main() { printf("%i\n", sizeof(int8)); printf("%i\n", sizeof(int32)); printf("%i\n", sizeof(int64)); int8 a = 100; int8 b = 100; int32 c = a * b; printf("%i\n", c); int32 d = 1000000000; int32 e = 1000000000; int64 f = d * e; printf("%I64d\n", f); } The output with MinGW GCC 3.4.5 is (-O0): 1 4 8 10000 -1486618624 The first multiplication is casted to an int32 internally (according to the assembler output). The second multiplication is not casted. I'm not sure