The best optimal way to find the frequency in a very very long string

雨燕双飞 提交于 2019-12-13 08:37:07

问题


I have to find a very optimal way to find the frequency of a character in a very very long file containing words,(cases are ignored, should count both Lower case and Upper case) using C/C++. I already know one which is this (here i am reading input from user at terminal but in my case i will be reading from file, so please do not go to gets() function, please focus on my main objective which is to get a more optimized way than this (if any is possible) ):

int main()
{
   char string[100];
   int c = 0, count[26] = {0};

   printf("Enter a string\n");
   gets(string);

   while (string[c] != '\0')
   {
      /** Considering characters from 'a' to 'z' only
          and ignoring others */

      if (string[c] >= 'a' && string[c] <= 'z') 
         count[string[c]-'a']++;

      c++;
   }

   for (c = 0; c < 26; c++)
   {
      /** Printing only those characters 
          whose count is at least 1 */

      if (count[c] != 0)
         printf("%c occurs %d times in the entered string.\n", c + 'a', count[c]);
   }

   return 0;
}

But i want to optimize it some more than this because it has to work for a very very long file containing a lot of words, Could some one please give me any suggestion or ideas ? Thanks.


回答1:


The asymptotic complexity doesn't get any better, and in general the algorithm is already mostly at the bare minimum.

The single most important change you can make is to call less frequently IO functions (and you are not going to call gets for real); use fread and read in a big (say, 4 KB) buffer - larger sizes are usually not beneficial.

Depending on the CPU and cache, if you already had the whole string in memory it may gain you something to just make count 256 elements long and avoid the if for alphabetical characters (trading one less branch prediction spot for bigger cache occupation). But I doubt this could be even measurable - your code should now be completely IO-bound, with the CPU time needed for processing being completely negligible compared to the wait for the disk reads.



来源:https://stackoverflow.com/questions/33007156/the-best-optimal-way-to-find-the-frequency-in-a-very-very-long-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!