I\'m trying to multiply two 128-bit integers in C.
Here is my algorithm:
Split the two 128-bit sequences into S1 and S2.
Then split S1 into S11 (fron
You are stuck in an infinite loop because i += 1/32 is the same as i += 0.  
Also: note:memcpy(&d[3l/2-i], dstM1, 1/8); is memcpy(&d[1-i], dstM1, 0);
If you're on gcc or clang you can use __int128 and unsigned __int128 directly.
Summarizing your question: How can you add two arrays of (unsigned) integers propagating the carry.
uint16_t foo[4];  // 0000 aaaa FFFF cccc
uint16_t bar[4];  // ffffdd eeee FFFF 0000
The good point is that 'FFFF+FFFF+1' is simply (1)FFFF. Thus the carry can always be added in each word without producing an extra carry (as if the sum could be 20000).
Making a temporary sum: sum = foo[3] + bar[3] + carry; with carry being initially 0, either this sum produces a new carry, or not.
(A+B) < A((A + c) < A) || (((A + c) + B) < B)Another possibility is to calculate "multi-bit carry" by summing up several terms in columns, which occurs often in bignum multiplications:
            AAAA BBBB CCCC
       DDDD EEEE FFFF ....
  GGGG HHHH IIII .... ....
--------------------------
  col4 col3 col2 col1 col0
Now each column produces 32-bit or 64-bit result and a carry that doesn't necessarily fit a single bit.
uint32_t sum_low = carry_in_from_previous_column;
uint32_t sum_high = 0;
for (i = 0; i < N; i++) {
     sum_low += matrix[i][column] & 0xffff;
     sum_high += matrix[i][column] >> 16;
}
sum_high += sum_low >> 16;    // add the multibit half carry
result = (sum_low & 0xffff) | (sum_high << 16);
carry_out = sum_high >> 16;