Can you please explain Reed Solomon encoding part's Identity matrix?

问题

I am working on a object storage project where I need to understand Reed Solomon error correction algorithm, I have gone through this Doc as a starter and also some thesis paper.
1. content.sakai.rutgers.edu
2. theseus.fi
but I can't seem to understand the lower part of the identity matrix (red box), where it is coming from. How this calculation is done?

Can anyone please explain this.

回答1:

The encoding matrix is a 6 x 4 Vandermonde matrix using the evaluation points {0 1 2 3 4 5} modified so that the upper 4 x 4 portion of the matrix is the identity matrix. To create the matrix, a 6 x 4 Vandermonde matrix is generated (where matrix[r][c] = pow(r,c) ), then multiplied with the inverse of the upper 4 x 4 portion of the matrix to produce the encoding matrix. This is the equivalent of "systematic encoding" with Reed Solomon's "original view" as mentioned in the Wikipedia article you linked to above, which is different than Reed Solomon's "BCH view", which links 1. and 2. refer to. The Wikipedia's example systematic encoding matrix is a transposed version of the encoding matrix used in the question.

https://en.wikipedia.org/wiki/Vandermonde_matrix

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Systematic_encoding_procedure:_The_message_as_an_initial_sequence_of_values

The code to generate the encoding matrix is near the bottom of this github source file:

https://github.com/Backblaze/JavaReedSolomon/blob/master/src/main/java/com/backblaze/erasure/ReedSolomon.java

Vandermonde     inverse upper   encoding
matrix          part of matrix  matrix

01 00 00 00                     01 00 00 00
01 01 01 01     01 00 00 00     00 01 00 00
01 02 04 08  x  7b 01 8e f4  =  00 00 01 00
01 03 05 0f     00 7a f4 8e     00 00 00 01
01 04 10 40     7a 7a 7a 7a     1b 1c 12 14
01 05 11 55                     1c 1b 14 12

Any 2 rows can be removed from the encoding matrix, resulting in one of 15 possible (6 things taken 4 at a time) 4 x 4 matrices, and all 15 possible matrices are invertible.

If only a single row of the 6 x 4 encoded data is bad, than any other row can be chosen arbitrarily, resulting in one of the 15 4 x 4 invertible matrices.

The erasure code document then describes the decoding process as a multiplication of the 4 x 4 subset of the encoded data by the inverse of the 4 x 4 subset of the encoding matrix to generate the original data.

Based on the data shown, the math is based on finite field GF(2^8) modulo 0x11D. For example, encoding using the last row of the encoding matrix with the last column of the data matrix is (0x1c·0x44)+(0x1b·0x48)+(0x14·0x4c)+(0x12·0x50) = 0x25 (using finite field math).

The question example doesn't make it clear that the 6 x 4 encode matrix can encode a 4 x n matrix, where n is the number of bytes per row. Example where n == 8:

encode           data                        encoded data

01 00 00 00                                  31 32 33 34 35 36 37 38
00 01 00 00      31 32 33 34 35 36 37 38     41 42 43 44 45 46 47 48
00 00 01 00  x   41 42 43 44 45 46 47 48  =  49 4a 4b 4c 4d 4e 4f 50
00 00 00 01      49 4a 4b 4c 4d 4e 4f 50     51 52 53 54 55 56 57 58
1b 1c 12 14      51 52 53 54 55 56 57 58     e8 eb ea ed ec ef ee dc
1c 1b 14 12                                  f5 f6 f7 f0 f1 f2 f3 a1

assume rows 0 and 4 are erasures and deleted from the matrices:

00 01 00 00                                  41 42 43 44 45 46 47 48
00 00 01 00                                  49 4a 4b 4c 4d 4e 4f 50
00 00 00 01                                  51 52 53 54 55 56 57 58
1c 1b 14 12                                  f5 f6 f7 f0 f1 f2 f3 a1

invert encode sub-matrix:

inverse         encode          identity

46 68 8f a0     00 01 00 00     01 00 00 00
01 00 00 00  x  00 00 01 00  =  00 01 00 00
00 01 00 00     00 00 00 01     00 00 01 00
00 00 01 00     1c 1b 14 12     00 00 00 01

reconstruct data using sub-matrices:

inverse         encoded data                restored data

46 68 8f a0     41 42 43 44 45 46 47 48     31 32 33 34 35 36 37 38
01 00 00 00  x  49 4a 4b 4c 4d 4e 4f 50  =  41 42 43 44 45 46 47 48
00 01 00 00     51 52 53 54 55 56 57 58     49 4a 4b 4c 4d 4e 4f 50
00 00 01 00     f5 f6 f7 f0 f1 f2 f3 a1     51 52 53 54 55 56 57 58

The actual process only uses the rows of the matrices that correspond
to the erased rows that need to be reconstructed.
First data is reconstructed:

sub-inverse     encoded data                reconstructed data

                41 42 43 44 45 46 47 48
46 68 8f a0  x  49 4a 4b 4c 4d 4e 4f 50  =  31 32 33 34 35 36 37 38
                51 52 53 54 55 56 57 58
                f5 f6 f7 f0 f1 f2 f3 a1

Once data is reconstructed, reconstruct parity

sub-encode      data                        reconstruted parity

                31 32 33 34 35 36 37 38
1b 1c 12 14  x  41 42 43 44 45 46 47 48  =  e8 eb ea ed ec ef ee dc
                49 4a 4b 4c 4d 4e 4f 50
                51 52 53 54 55 56 57 58

An alternate method is to use BCH view Reed Solomon (for single error, correction done by xor, for two errors, by multiplying by 2x6 matrix (on 6 shards (including erased shards, which can contain garbage and still be corrected) instead of 4x4 matrix on 4 shards). Optionally, instead of generating parities, syndromes are generated, as described in this Raid 6 pdf file:

http://alamos.math.arizona.edu/RTG16/ECC/raid6.pdf

Note that this pdf file's alternate method uses the same finite field as the method above, GF(2^8) mod 0x11D, which may make it easier to compare the methods.

来源：https://stackoverflow.com/questions/59929677/can-you-please-explain-reed-solomon-encoding-parts-identity-matrix

标签

math

theory

reed-solomon