Making a 2d array from a text file

问题

Im working with a sparse matrix and im given a text file like this:

Basically the way it works is the number 1.2 goes on the position [0] [3] and the elements of the matriz that are not mentioned stay at 0, so in this case it should look like this:

5.2 0 0 1.2 0 0
0 0 0 0 0 0
0 0 0 0 0 3.2
2.1 0 0 0 0 4.2
0 0 0 0 0 2.2

回答1:

OP wrote this in the comments:

Im so sorry but my teacher clarified everything just now... it turns out that, for each line, the first number is the row, the second the column and the third the element.with the example i have above, 1.2 has to go in the position [0][3]. The matrix does not have to be square.

This makes every thing different. If you don't know the dimensions of the matrix, then you have to read everything first, then calculate the matrix dimensions, allocate space for the matrix and then fill it with the values.

I'd do this:

#include <stdio.h>
#include <stdlib.h>

#define BLOCK 1024

struct matrix_info {
    int col;
    int row;
    double val;
};

void free_matrix(double **matrix, size_t rows)
{
    if(matrix == NULL)
        return;

    for(size_t i = 0; i < rows; ++i)
        free(matrix[i]);
    free(matrix);
}

double **readmatrix(const char *fname, size_t *rows, size_t *cols)
{
    if(fname == NULL || rows == NULL || cols == NULL)
        return NULL;

    double **matrix = NULL;
    struct matrix_info *info = NULL;
    size_t mi_idx = 0; // matrix info index
    size_t mi_size = 0;

    FILE *fp = fopen(fname, "r");
    if(fp == NULL)
    {
        fprintf(stderr, "Cannot open %s\n", fname);
        return NULL;
    }

    *rows = 0;
    *cols = 0;

    for(;;)
    {
        if(mi_idx >= mi_size)
        {
            struct matrix_info *tmp = realloc(info, (mi_size + BLOCK) * sizeof *info);
            if(tmp == NULL)
            {
                fprintf(stderr, "not enough memory\n");
                free(info);
                fclose(fp);
                return NULL;
            }

            info = tmp;
            mi_size += BLOCK;
        }

        int ret = fscanf(fp, "%d %d %lf", &info[mi_idx].row, &info[mi_idx].col,
                    &info[mi_idx].val);

        if(ret == EOF)
            break; // end of file reached

        if(ret != 3)
        {
            fprintf(stderr, "Error parsing matrix\n");
            free(info);
            fclose(fp);
            return NULL;
        }

        if(*rows < info[mi_idx].row)
            *rows = info[mi_idx].row;

        if(*cols < info[mi_idx].col)
            *cols = info[mi_idx].col;

        mi_idx++;
    }

    fclose(fp);

    // mi_idx is now the length of info
    // *cols and *rows have the largest index
    // for the matrix, hence the dimension is (rows + 1) x (cols + 1)
    (*cols)++;
    (*rows)++;

    // allocating memory

    matrix = calloc(*rows, sizeof *matrix);
    if(matrix == NULL)
    {
        fprintf(stderr, "Not enough memory\n");
        free(info);
        return NULL;
    }

    for(size_t i = 0; i < *rows; ++i)
    {
        matrix[i] = calloc(*cols, sizeof **matrix);
        if(matrix[i] == NULL)
        {
            fprintf(stderr, "Not enough memory\n");
            free(info);
            free_matrix(matrix, *rows);
            return NULL;
        }
    }

    // populating matrix

    for(size_t i = 0; i < mi_idx; ++i)
    {
        int r,c;
        r = info[i].row;
        c = info[i].col;
        matrix[r][c] = info[i].val;
    }

    free(info);
    return matrix;
}

int main(void)
{
    const char *fn = "/tmp/matrix.txt";

    size_t rows, cols;

    double **matrix = readmatrix(fn, &rows, &cols);

    if(matrix == NULL)
        return 1;

    for(size_t i = 0; i < rows; ++i)
    {
        for(size_t j = 0; j < cols; ++j)
            printf("%0.3f ", matrix[i][j]);

        puts("");
    }

    free_matrix(matrix, rows);
    return 0;
}

The output is (for a file with your sample data)

5.200 0.000 0.000 1.200 0.000 0.000 
0.000 0.000 0.000 0.000 0.000 0.000 
0.000 0.000 0.000 0.000 0.000 3.200 
2.100 0.000 0.000 0.000 0.000 4.200 
0.000 0.000 0.000 0.000 0.000 2.200

So a quick explanation of what I'm doing:

I read the file and store in an dynamically allocated array the information about the column, the row and the value. This information is stored in the struct matrix_info *info.

The idea is that I read every line and extract the three values. While I read the file, I also store the largest index for the column and the row

    ...

    if(*rows < info[mi_idx].row)
        *rows = info[mi_idx].row;

    if(*cols < info[mi_idx].col)
        *cols = info[mi_idx].col;

    ...

so when the file is read, I know the dimensions of the matrix. Now all values with their row & column are stored in the info array, so the next step is to allocate memory for the matrix and fill the values based based on the info[i] entries.

for(size_t i = 0; i < mi_idx; ++i)
{
    int r,c;
    r = info[i].row;
    c = info[i].col;
    matrix[r][c] = info[i].val;
}

At the end I free the memory for info and return the matrix.

Another interesting part is this:

    if(mi_idx >= mi_size)
    {
        struct matrix_info *tmp = realloc(info, (mi_size + BLOCK) * sizeof *info);
        if(tmp == NULL)
        {
            fprintf(stderr, "not enough memory\n");
            free(info);
            fclose(fp);
            return NULL;
        }

        info = tmp;
        mi_size += BLOCK;
    }

Because you mentioned that the only thing you know about the matrix is that it might contain up to 10000 elements, then the input file might be very big. Instead of reallocating memory for the info elements on every loop, I allocate chunks of 1024 (BLOCK) info elements at a time. Thus once a block is full, the next block is allocated and so on. So I call realloc only every 1024 iterations.

回答2:

you need use in first :

float* sparseMatrix = malloc(sizeof(float) * 10000);

You start read the file and after the first line read you know the nomber of colums and the number of row is the number of line read. After you can reduce the matrix if you want.

free(sparseMatrix );
sparseMatrix = malloc(sizeof(float) * nbRow*nbColum);

回答3:

You simply don't have enough information to construct an appropriate matrix. In your cited case, you know you have AT LEAST 5 rows and AT LEAST 6 columns, but you don't know exactly how many rows m and columns n are in your matrix. So for your given input:

You could have a 5x6 matrix as:

5.2  0.0  0.0  1.2  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  3.2
2.1  0.0  0.0  0.0  0.0  4.2
0.0  0.0  0.0  0.0  0.0  2.2

Or you could have a 10x6 matrix as:

5.2  0.0  0.0  1.2  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  3.2
2.1  0.0  0.0  0.0  0.0  4.2
0.0  0.0  0.0  0.0  0.0  2.2
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0  0.0  0.0  0.0

This ambiguity is a problem as the first matrix is vastly different that the second.

Also, the point of a sparse matrix is to be efficient with memory and/or processing time. If you allocate a full array of m rows and n columns then you have a dense matrix representation instead.

回答4:

There's no real way of knowing what's in a file unless you completely read the file yourself first.

Since you know that there will be max 10k elements, you can statically allocate an array of that size first, and then load the numbers into the until you parse an EOF.

float sparseMatrix[10000];

It just means that your program will always allocate the space for 10k elements regardless of how many elements are actually in the array. If you assume that each element takes up 4 bytes, then you'll only be using ~40kB of memory. This is probably the easiest solution.

The other option would be to read the file fully, figure out the required size, and dynamically allocate that size, then read through the whole file again whilst populating the elements.

// Assume numberOfElements was determined by reading through the file first
float* sparseMatrix = malloc(sizeof(float) * numberOfElements);

Though this method will use less memory, it requires two full reads of the file + the overhead of calling malloc.

回答5:

You can measure the number of rows and columns and then define your 2d array. Of course, this will increase the time complexity! If you don't care about the size of the memory, you can define your array with the max columns and rows! So you should choose between big time complexity and big memory size.

来源：https://stackoverflow.com/questions/49909416/making-a-2d-array-from-a-text-file

标签

multidimensional-array

text-files