Obtaining zero-length string from strtok()

时光毁灭记忆、已成空白 提交于 2019-12-01 17:33:02

问题


I have a CSV file containing data such as

value;name;test;etc

which I'm trying to split by using strtok(string, ";"). However, this file can contain zero-length data, like this:

value;;test;etc

which strtok() skips. Is there a way I can avoid strtok from skipping zero-length data like this?


回答1:


A possible alternative is to use the BSD function strsep() instead of strtok(), if available. From the man page:

The strsep() function is intended as a replacement for the strtok() function. While the strtok() function should be preferred for portability reasons (it conforms to ISO/IEC 9899:1990 ("ISO C90")) it is unable to handle empty fields, i.e., detect fields delimited by two adjacent delimiter characters, or to be used for more than a single string at a time. The strsep() function first appeared in 4.4BSD.

A simple example (also copied from that man page):

char *token, *string, *tofree;

tofree = string = strdup("value;;test;etc");
while ((token = strsep(&string, ";")) != NULL)
    printf("token=%s\n", token);

free(tofree);

Output:

token=value
token=
token=test
token=etc

so empty fields are handled correctly.

Of course, as others already said, none of these simple tokenizer functions handles delimiter inside quotation marks correctly, so if that is an issue, you should use a proper CSV parsing library.




回答2:


There is no way to make strtok() not behave this way. From man page:

A sequence of two or more contiguous delimiter bytes in the parsed string is considered to be a single delimiter. Delimiter bytes at the start or end of the string are ignored. Put another way: the tokens returned by strtok() are always nonempty strings.

But what you can do is check the amount of '\0' characters before the token, since strtok() replaces all encountered tokens with '\0'. That way you'll know how many tokens were skipped. Source info:

This end of the token is automatically replaced by a null-character, and the beginning of the token is returned by the function.

And a code sample to show what I mean.

char* aStr = ...;
char* ptr = NULL;

ptr = strtok (...);

char* back = ptr;
int count = -1;
do {
  back--;
  if (back <= aStr) break; // to protect against reads before aStr
  count++;
} while (*back = '\0');

(written without ide or testing, may be an invalid implementation, but the idea stands).




回答3:


No you can't. From "man strtok":

A sequence of two or more contiguous delimiter characters in the parsed string is considered to be a single delimiter. Delimiter characters at the start or end of the string are ignored. Put another way: the tokens returned by strtok() are always nonempty strings.

You could also run into problems if your data contains the delimiter inside quotes or any other "escape".

I think the best solution is to get a CSV parsing library or write your own parsing function.



来源:https://stackoverflow.com/questions/18827546/obtaining-zero-length-string-from-strtok

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!