C - Counting words, characters and lines in file. Character count

烂漫一生 提交于 2021-01-28 00:57:48

问题


I have to write a code in C, which outputs the number of characters, lines and words in a given file. The task seems to be simple, but I'm really not sure what went wrong at this point.

So, here's the code:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main()
{
    FILE *file;
    char filename[256];
    char ch;
    char prevch;

    int lines=0;
    int words=0;
    int characters=0;

    printf("Enter your filename (don't forget about extension!):\n");
    scanf("%s", filename);

    file=fopen(filename, "r");
    if(file == NULL)
    {
        printf("Cannot open file %s \n", filename);
        exit(0);
    }
    else
    {

        while((ch=fgetc(file))!=EOF)
        {
            if(ch==' ' || ch=='\n' || ch=='\t')
            {
                if(isspace(prevch)==0)
                {
                    words++;
                }
            }
            if(ch=='\n')
            {
                lines++;
            }

            prevch=ch;
            characters++;
        }
    }

    fclose(file);

    if(isspace(prevch)==0)
    {
        words++;
    } 

    printf("Number of characters: %d\n", characters);
    printf("Number of words: %d\n", words);
    printf("Number of lines: %d\n", lines);

    return 0;
}

The idea of the task is that the output should be the same, as the output of command wc in Linux. But I've got absolutely no idea, why my loop is skipping some of the characters. The way I've written the code should be proper to count EVERY SINGLE character, even those whitespace. Why then my program shows sample file contains 65 characters, when wc shows 68? I've thought that maybe there are some characters, which are skipped by fgetc, but it's impossible as I've used the function before when I was writing a program to copy content of one text file to another and everything worked properly.

By the way, is my solution for word count correct? The condition after loop should make sure that last word before EOF is counted. I've used isspace to make sure that there aren't just some blank spaces in the ending.

Thanks!


回答1:


"My program shows sample file contains 65 characters, when wc shows 68"

Are you working on Windows, and does your file have just three lines? If so, the problem is that Windows maps CRLF line endings to newlines, so 3 CRLF pairs are mapped to 3 newlines (LF-only) endings, accounting for the discrepancy. To fix this problem, open the file in binary mode.

Without having run your code, I think your code for counting words is OK. You could instead use an 'in-word' flag initially set to 0 (false) and switch to true and count a new word when you detect something that's not white space while you're not in a word. Both work; they're slightly different.

Also, remember that fgetc() and relatives return an int, not a char. You cannot reliably detect EOF if you save the return value in a char, though the nature of the problem depends on whether plain char is signed or unsigned and the code set in use.

If plain char is an unsigned type, you can never detect EOF (because EOF is mapped to 0xFF, and when that is converted to int for comparison with EOF, it is positive). If plain char is signed, if the input contains code 0xFF (in ISO 8859-1 and related code sets, that's ÿ — LATIN SMALL LETTER Y WITH DIAERESIS in Unicode terminology), you detect EOF early. However, valid UTF-8 can never contain a byte 0xFF (nor 0xC0, 0xC1, nor 0xF5..0xFF), so you shouldn't run into that misinterpretation problem — but then your code is byte counting and not character counting too.




回答2:


You could do it like this

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main()
{
    FILE *file;
    char filename[256];
    char ch;
    char prevch = '\0';

    int lines = 0;
    int words = 0;
    int characters = 0;

    printf("Enter your filename (don't forget about extension!):\n");
    scanf("%s", filename);

    file = fopen(filename, "r");
    if(file == NULL)
    {
        fprintf(stderr, "Cannot open file %s \n", filename);
        exit(-1);
    }

    while((ch = fgetc(file)) != EOF)
    {
        if(isspace(ch))
        {
            if (ch == '\n')
                lines++;
        }else {
            if (prevch == '\0' || isspace(prevch)) 
                words++;
        }

        characters++;
        prevch = ch;  
    }

    fclose(file);

    printf("Number of characters: %d\n", characters);
    printf("Number of words: %d\n", words);
    printf("Number of lines: %d\n", lines);

    return 0;
}


来源:https://stackoverflow.com/questions/47902157/c-counting-words-characters-and-lines-in-file-character-count

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!