Sparse matrix storage in C

邮差的信 提交于 2019-12-13 16:12:40

问题


I have a sparse matrix that is not symmetric I.E. the sparsity is somewhat random, and I can't count on all the values being a set distance away from the diagonal.

However, it is still sparse, and I want to reduce the storage requirement on the matrix. Therefore, I am trying to figure out how to store each row starting at the first non-zero, in order, until I get to the last non-zero.

That is, if the first non-zero of row m occurs at column 2, and the last non-zero is at column 89, I want to store in A[m] rows 2-> 89.

Since each row does not have the same number of non-zeros, I will make all the rows of A have the same number of elements, and pad zeros to the end of the row for rows that have a smaller number of non-zero elements.

How do I do this translation in C? I do not actually have an original, full matrix to just copy the values from (the original matrix is coming to me in CSR form). If I was doing this in fortran, I could just define my array to be two dimensional and just have each row be variable length by tracking the start/stop values of non-zero columns and store it like that.

I will try to demonstrate below:

This is a matrix representation of the values I know - and for each value, I know the row and column location

  [1    2    3    4                   ]
  [   5    6    7    8                ]
  [       10    11    12    13        ]
 m[   14    15    16    17       18   ]
  [         19    20    21         22 ]

Now for this one row m has the largest "span" between the first non-zero and last non-zero so my new matrix is going to be 5x[span of row m]

  [1     2     3     4          ]
  [5     6     7     8          ]
  [10    11    12    13         ]
 m[14    15    16    17       18]
  [19    20    21       22      ] 

As you can see, row m needs no zero padding since it was the longest "span" anyway

The other rows now all have row zero as the first non-zero, and maintain the spacing of zeros columns between each non-zero.


回答1:


I would implement this as a ragged array, with A[n][0] always returning the element on the diagonal. A[n][1] will return the item just to the right of the diagonal, A[n][2] will return the item to the left of the diagonal, and so. Then, you just need a function that maps matrix index [i,j] to ragged array index[r][s].

This has the advantage of sparsity, and if your values stay close to the diagonal the arrays are not very long.


Alternatively, you could have this definition:

struct Row
{
    int InitialOffset;
    int NumElements;
    int[] Values;
}

Then you would have a Row[]. Retrieving a value based on matrix index would look like this:

//matrix is merely an array of rows...
int GetValue(*matrix this, int i, int j)
{
    Row CurrentRow = (*this)[i];
    if (CurrentRow.InitialOffset > j)
        return 0;
    else if (CurrentRow.InitialOffset + CurrentRow.NumElements < j)
        return 0; 
    return CurrentRow.Values[j - CurrentRow.InitialOffset]
}

My C syntax is a little hazy, but you should get the idea.


Based on your demonstration, I would recommend this:

struct Matrix
{
    int[,] Data
    int[] StartOffset;
    int[] NumberElements;
}

Used as follows...

int GetValue(*Matrix this, int i, int j)
{
    if (this.StartOffset[i] > j)
        return 0;
    else if (this.StartOffset[i] + this.NumberElements[i] < j)
        return 0; 
    return this.Data[i, j-this.StartOffset[i]];
}

Your initialization procedure would look something like this

//Data is a struct that holds row index, col index, and value
Matrix* InitMatrix (*Data values, int numVals)
{
    //loop through values to find longest row and number of rows
    //create new matrix, malloc matrix for longrow * numRows
    //malloc numrows elements for StartOffset and NumItems
    //foreach row, find min() and max()-min() of col indexs and 
    //store in StartOffset and NumItems
}

You need to do some processing, but data compression isn't cheap.




回答2:


An alternate approach is to use a linked structure (very efficient if the matrix very sparse, not so good as it gets more filled). I hinted at the implementation in a earlier answer.

I you are going to go with the continuous run implementation, I'm not sure that you really want/need to use equal lengths rows. Why not use a ragged array?




回答3:


Derek, you mentioned in one of the comments that you want to use a single malloc. That means that you know how many nonempty elements you have. Given this, tt is possible to store the sparse matrix in an array which holds, per element, the value of the matrix element and the "location delta" to the next element. Something like:

struct melem {
    int value; // value of data
    int offset; // offset to next element
}

struct melem matrix[num_nonempty_elements];

...

// Note: this is pseudocode!
matrix[row*COLS + col].value = a[row][col];
matrix[row*COLS + col].offset = (row*COLS + col)_[i] - (row*COLS + col)_[i-1];

EDIT: Thinking about it, this is pretty similar to the linked list approach, but requires 1 allocation. OTOH, it may require more calculation to access the required cell.



来源:https://stackoverflow.com/questions/3470910/sparse-matrix-storage-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!