Need to know when no data appears between two token separators using strtok()

早过忘川 提交于 2019-11-27 02:10:41

7.21.5.8 the strtok function

The standard says the following regarding strtok:

[#3] The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.


Am I doomed to weep in silence, or can somebody help me out?

You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.

strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.

If no matching character is found the function will return NULL.


strtok_single

char *
strtok_single (char * str, char const * delims)
{
  static char  * src = NULL;
  char  *  p,  * ret = 0;

  if (str != NULL)
    src = str;

  if (src == NULL)
    return NULL;

  if ((p = strpbrk (src, delims)) != NULL) {
    *p  = 0;
    ret = src;
    src = ++p;

  } else if (*src) {
    ret = src;
    src = NULL;
  }

  return ret;
}

sample use

  char delims[] = ",";
  char data  [] = "foo,bar,,baz,biz";

  char * p    = strtok_single (data, delims);

  while (p) {
    printf ("%s\n", *p ? p : "<empty>");

    p = strtok_single (NULL, delims);
  }

output

foo
bar
<empty>
baz
biz

You can't use strtok() if that's what you want. From the man page:

A sequence of two or more contiguous delimiter characters in the parsed string is considered to be a single delimiter. Delimiter characters at the start or end of the string are ignored. Put another way: the tokens returned by strtok() are always nonempty strings.

Therefore it is just going to jump from c to d in your example.

You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.

Lately I was looking for a solution to the same problem and found this thread.

You can use strsep(). From the manual:

The strsep() function was introduced as a replacement for strtok(3), since the latter cannot handle empty fields.

MSN

As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:

char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;

int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
    if(token_length)
        sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
    else
        sprintf(arr_fields[i], "%s", "-");
    current_token += token_length;
}
  1. Parse (for example, strtok)
  2. Sort
  3. Insert
  4. Rinse and repeat as needed :)

You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!