Why my levenshtein distance calculator fails with PDF file?

时光怂恿深爱的人放手 提交于 2021-01-29 09:44:14

问题


I'm trying to create a program that calculate edit distance between two files. I read with the funcution fread and I use the code to read binary ("rb"). I put in input two PDF files and during the debug I found out that when I try to fill the matrix of the Levenshtein distance algorithm I get a "SIGSEGV (Segmentation fault)" at char n° 1354 of the first file and the program exit with:

Process finished with exit code -1073741819 (0xC0000005)

I controlled and char n° 1354 is \n .

The code that I use to read the files is:

long getFileSize(FILE *file) {
long int size;
fseek(file, 0, SEEK_END);
size = ftell(file);
fseek(file, 0, SEEK_SET);
return size;
}

char *readFromBinary(char *path) {
FILE *file;
file = fopen(path, "rb");
if (file == NULL)
    printf("Error!\n");

long fileSize = getFileSize(file);
char *buffer = malloc((fileSize + 1) * sizeof(char));

fread(buffer, sizeof(char), fileSize, file);
return buffer;
}

This is the code that I use to calculate the edit distance:

int calculateDistance(char *pathFile1, char *pathFile2, int choice, char *path) {
FILE *f1 = fopen(pathFile1, "rb");
FILE *f2 = fopen(pathFile2, "rb");
char *contentFile1 = readFromBinary(pathFile1);
char *contentFile2 = readFromBinary(pathFile2);

int distance = 0;
int dim1 = getFileSize(f1);
int dim2 = getFileSize(f2);

int **matrix = constructMatrix(dim1, dim2);
fillMatrix(matrix, dim1, dim2, contentFile1, contentFile2);

distance = matrix[dim1][dim2];
struct Instruction instruction[distance + 1];

int initActions = initInstructions(matrix, pathFile1, &dim1, pathFile2, &dim2, instruction);
endInstructions(pathFile1, &dim1, pathFile2, &dim2, instruction, initActions);

if (choice == 1)
    printOnFile(instruction, distance, path);

for (int i = 0; i <= dim1; i++)
    free(matrix[i]);
free(matrix);

if (numberOfDivisions > 0)
    numberOfDivisions--;

return distance;
}

And this is the code that i use to create and fill the matrix:

int **constructMatrix(int dim1, int dim2) {
//matrice di puntatori
int **matrice = (int **) malloc((dim1 + 1) * sizeof(int *));

//matrice di puntatori
for (int i = 0; i <= dim1; i++)
    matrice[i] = (int *) malloc((dim2 + 1) * sizeof(int));

return matrice;
}

 void fillMatrix(int **matrix, int dim1, int dim2, char *file1, char *file2) {
  for (int i = 0; i <= dim1; i++)
    matrix[i][0] = i;
  for (int j = 1; j <= dim2; j++)
    matrix[0][j] = j;
  for (int i = 1; i <= dim1; i++) {
    for (int j = 1; j <= dim2; j++) {
        if (file1[i - 1] != file2[j - 1]) {
            int k = minimum(matrix[i][j - 1], matrix[i - 1][j], matrix[i - 1][j - 1]);
            matrix[i][j] = k + 1;
        } else
            matrix[i][j] = matrix[i - 1][j - 1];
    }
  }
}

In particular the debugger stops in this line of calculateDistance(fillMatrix(matrix, dim1, dim2, contentFile1, contentFile2);), and in this line of fillMatrix(matrix[i][0] = i;) when i=1354.

Information about PDF:

The PDF file is 188671 byte

It has 1355 lines

PS. My program works with txt files.


回答1:


You're allocating at least 188671 * 1355 * 4 bytes = 1022596820 bytes. You really need to check the return value from malloc to be sure that it was able to allocate successfully.




回答2:


When any of the memory allocation functions, including malloc, calloc, and realloc() make a request to the OS to obtain memory, unless the OS can find a single block of contiguous memory of the size requested, the function will return NULL. Since you are asking for a block of incredible size, it is likely to fail.

It is always recommended that the return of any of these functions is tested before attempting to use the value that was returned:

char *buffer = malloc((fileSize + 1) * sizeof(char));
if(!buffer)
{
    //handle error

And in this case, it would be good to re-evaluate your algorithm.



来源:https://stackoverflow.com/questions/63710629/why-my-levenshtein-distance-calculator-fails-with-pdf-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!