Calculate the number of times each letter appears in a string

两盒软妹~` 提交于 2021-01-29 06:06:51

问题


I've been playing around with some old code, and I came across a function that I made a while ago that calculates the number of times each alphabetical letter appears in a given string. In my initial function, I would loop through the string 26 times counting the number of times each letter appears as it loops through. However, I knew that was really inefficient, so instead I tried to do this:

int *frequency_table(char *string) { 
    int i;
    char c;
    int *freqCount = NULL;
    freqCount = mallocPtr(freqCount, 26, sizeof(int), "freqCount"); /* mallocs and checks for out of memory */

    for (i = 0; string[i] != '\0'; i++) {
        c = string[i];
        if (isalpha(c)) {
            isupper(c) ? freqCount[c - 65]++ : freqCount[c - 97]++;
        }
    }

    return (freqCount);
}

The code above loops through a string and checks each character. If the character is an alphabetic letter (a-z or A-Z), then I increment the frequency count at a specific index in the freqCount array (where index 0 = a\A, 1 = b\B, ... , 25 = z\Z).

The code seems to be counting fine, but when I print the array, I get the following output:

String: "abcdefghijklmnopqrstuvwxyziii"

a/A     -1276558703
b/B     32754
c/C     -1276558703
d/D     32754
e/E     862570673
f/F     21987
g/G     862570673
h/H     21987
i/I     4
j/J     1
k/K     1
l/L     1
m/M     1
n/N     1
o/O     1
p/P     1
q/Q     1
r/R     1
s/S     1
t/T     1
u/U     1
v/V     1
w/W     1
x/X     1
y/Y     1
z/Z     1

For reference, I'm printing the array in the following manner:

for (i = 0; i < 26; i++) {
     printf("%c/%c     %d\n", i + 97, i + 65, freqCount[i]);
}

I checked to make sure that the pointer allocated properly, I know for sure I didn't overwrite this memory location. Maybe I'm missing something but I really can't figure out why it's printing garbage memory values from a\A-h\H.

Also, if there is a more efficient way to do what I'm trying to do, I'd love to hear it.

Thanks


回答1:


  • As many mentioned you have to initialize value to 0
  • Also you can use below trick to speed up letter counting: if it is a letter you clear the bit 32, which is the bit difference between uppercase and lowercase, which will give you the correct index.
  • Last, you can use a short array unless you expect a LOT of letters.
#include <stdio.h>
#include <stdlib.h>

short *frequency_table(char *string){ 
    char c;
    short *freqCount;

    if (!(freqCount = (short*)calloc(26, sizeof(short))))
        return NULL;

    for(int i = 0; (c = string[i]) != '\0'; i++) {
        if(isalpha(c))
            freqCount[(c & ~32) - 'A']++;
    }

    return(freqCount);
}

Main Test:

int main() {
    short *n = frequency_table("helloiHEllo6456gdrgd#%#^#$^#_thirde");

    for (char c = 'a'; c <= 'z'; c++)
         printf("%c: %d\n", c, n[c - 'a']);
    return 0;
}



回答2:


There are 2 problems in your code:

  • the array freqCount is uninitialized.
  • you should avoid passing char values to isalpha because it would cause undefined behavior if string contains negative char values on systems where char is signed by default.

Instead of a ternary operator or an if statement, you can use toupper() to convert lowercase characters to uppercase, and it is more readable to write 'A' or 'a' instead of their hard coded ASCII values 65 and 97.

Here is a corrected version:

int *frequency_table(const char *string) { 
    size_t i;

    /* allocate the array with malloc and check for out of memory */
    int *freqCount = mallocPtr(freqCount, 26, sizeof(int), "freqCount");

    for (i = 0; i < 26; i++) {
        freqCount[i] = 0;
    }
    for (i = 0; string[i] != '\0'; i++) {
        unsigned char c = string[i];
        if (isalpha(c)) {
            /* this code assumes ASCII, so 'Z'-'A' == 25 */
            freqCount[toupper(c) - 'A']++;
        }
    }
    return freqCount;
}



回答3:


the following proposed code:

  1. avoids malloc(), calloc(), etc
  2. keeps the definition of data, etc inside the main() function
  3. performs the desired functionality
  4. cleanly compiles
  5. uses simple character literals rather than 'magic' numbers
  6. is expecting the ASCII character set

and now, the proposed code:

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

#define MAX_ALPHA  26

void charCounter( char *,  int * );

int main( void )
{
    char string[] = "abcdefghijklmnopqrstuvwxyziii";    
    int  freqCount[ MAX_ALPHA ] = {0};

    charCounter(  string, freqCount );


    for( size_t i = 0; i < 26; i++)
    {
        printf("%c/%c     %d\n", (char)(i + 'A'), (char)(i + 'a'), freqCount[i]);
    }
}


void charCounter( char *string, int freqCount[] )
{
    for( size_t i=0; string[i]; i++ )
    {
        if( isalpha( string[i] ) )
        {
            freqCount[ toupper(string[i]) - 'A' ]++;
        }
    }
}

a run of the code results in:

A/a     1
B/b     1
C/c     1
D/d     1
E/e     1
F/f     1
G/g     1
H/h     1
I/i     4
J/j     1
K/k     1
L/l     1
M/m     1
N/n     1
O/o     1
P/p     1
Q/q     1
R/r     1
S/s     1
T/t     1
U/u     1
V/v     1
W/w     1
X/x     1
Y/y     1
Z/z     1


来源:https://stackoverflow.com/questions/61148630/calculate-the-number-of-times-each-letter-appears-in-a-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!