Use strtok read csv file

匆匆过客 提交于 2020-01-03 06:40:12

问题


I am trying to use strtok in C to read csv file, and store the contents into array of struct Game. My code is shown below:

  FILE *fp;
  int i = 0;
  if((fp=fopen("Games.csv","r"))==NULL)
    {
      printf("Can't open file.\n");
      exit(1);
    }
  rewind(fp);
  char buff[1024]; 
  fgets(buff,1024,fp);
  char* delimiter = ",";

  while(fgets(buff, 1024, (FILE*)fp)!=NULL && i<5){

    Game[i].ProductID= strtok(buff, ",");   


    Game[i].ProductName = strtok(NULL, delimiter);

        Game[i].Publisher = strtok(NULL, delimiter);

    Game[i].Genre = strtok(NULL, delimiter);

    Game[i].Taxable = atoi(strtok(NULL, delimiter));

    Game[i].price = strtok(NULL, delimiter);

    Game[i].Quantity  = atoi(strtok(NULL, delimiter));


       printf("%s\n", Game[i].ProductID);

    i++;
   }


    i = 0;
    for(i = 0; i<5; i++){
       printf("%s", Game[i].ProductID);
    }

The output is shown below:

DS_25ROGVOIRY
DS_25MMD4N2BL
DS_258KADVNLH
DS_25UR7M375D
DS_25FP45CJFZ
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2

The first five lines (in the while loop) are correct. However, the last five lines( outside of while loop) are wrong, it print the whole line content.

I am so confused about it. When the array is changed and how to still print the correct answer after while loop.


回答1:


First, a primer on how strtok() works. The function will give you back a pointer to somewhere in the original string, said string having been modified to make it look like you only have a single token (a).

For example, the first strtok of "A,B,C" would turn it into "A\0B,C" and give you back the address of the A character. Using it at that point would then give you "A".

Similarly, the second call would turn it into "A\0B\0C" and give you back the address of the B character.

The fact that it's giving you pointers into the original string is paramount here because that original string is located in buff.

And, you're actually overwriting buff every time you read a line from the file. So, for all those five lines, Game[i].ProductID will simply be the address of the first character of buff. After you have processed the fifth line, the line:

while (fgets(buff, 1024, fp) != NULL && i < 5)

will first read in the sixth line before exiting the loop.

This is why the final lines you see are actually not the same as any of the first five. You're printing out all the C strings for ProductID, at the (identical) addresses of buff, so you only see the sixth one, and you see the full line because you didn't tokenise that one after reading it in.

What you need to do is to make a copy of the tokens before overwriting the line. That can be done with something like (it's a little complex but correctly handles the case where strtok returns NULL):

if ((Game[i].ProductID = strtok(buff, ",")) != NULL)
    Game[i].ProductID = strdup(Game[i].ProductID);

remembering that you should free those memory allocations at some point.

In the incredibly unlikely event your environment doesn't have a strdup (it's POSIX rather than ISO), see here.


And, just as an aside, most CSV implementations allow for embedded commas such as by enclosing them in quotes or escaping them (the latter is rare but I have seen them):

name,"diablo, pax",awesome
name,diablo\, pax,awesome

Both of those may be expected to be three fields, name, diablo, pax and awesome.

Simplified processing with strtok will not allow for such complexities but, assuming your fields do not contain embedded commas, it may be okay. If your input is more complex, you may be better off using a third-party CSV library (with a suitable licence of course).


(a) For the language lawyers among us, this is covered in the ISO C standard, C11 7.24.5.8 The strtok function, /3 and /4 (my bold):

3/ The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

4/ The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtok function saves a pointer to the following character, from which the next search for a token will start.



来源:https://stackoverflow.com/questions/45449905/use-strtok-read-csv-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!