问题
I have to find a very optimal way to find the frequency of a character in a very very long file containing words,(cases are ignored, should count both Lower case and Upper case) using C/C++. I already know one which is this (here i am reading input from user at terminal but in my case i will be reading from file, so please do not go to gets() function, please focus on my main objective which is to get a more optimized way than this (if any is possible) ):
int main()
{
char string[100];
int c = 0, count[26] = {0};
printf("Enter a string\n");
gets(string);
while (string[c] != '\0')
{
/** Considering characters from 'a' to 'z' only
and ignoring others */
if (string[c] >= 'a' && string[c] <= 'z')
count[string[c]-'a']++;
c++;
}
for (c = 0; c < 26; c++)
{
/** Printing only those characters
whose count is at least 1 */
if (count[c] != 0)
printf("%c occurs %d times in the entered string.\n", c + 'a', count[c]);
}
return 0;
}
But i want to optimize it some more than this because it has to work for a very very long file containing a lot of words, Could some one please give me any suggestion or ideas ? Thanks.
回答1:
The asymptotic complexity doesn't get any better, and in general the algorithm is already mostly at the bare minimum.
The single most important change you can make is to call less frequently IO functions (and you are not going to call gets
for real); use fread
and read in a big (say, 4 KB) buffer - larger sizes are usually not beneficial.
Depending on the CPU and cache, if you already had the whole string in memory it may gain you something to just make count
256 elements long and avoid the if
for alphabetical characters (trading one less branch prediction spot for bigger cache occupation). But I doubt this could be even measurable - your code should now be completely IO-bound, with the CPU time needed for processing being completely negligible compared to the wait for the disk reads.
来源:https://stackoverflow.com/questions/33007156/the-best-optimal-way-to-find-the-frequency-in-a-very-very-long-string