I want to split a char *string
based on multiple-character delimiter. I know that strtok()
is used to split a string but it works with single chara
EDIT : Considered suggestions from Alan and Sourav and written a basic code for the same .
#include <stdio.h>
#include <string.h>
int main (void)
{
char str[] = "This is abc test abc string";
char* in = str;
char *delim = "abc";
char *token;
do {
token = strstr(in,delim);
if (token)
*token = '\0';
printf("%s\n",in);
in = token+strlen(delim);
}while(token!=NULL);
return 0;
}
Finding the point at which the desired sequence occurs is pretty easy: strstr
supports that:
char str[] = "this is abc a big abc input string abc to split up";
char *pos = strstr(str, "abc");
So, at that point, pos
points to the first location of abc
in the larger string. Here's where things get a little ugly. strtok
has a nasty design where it 1) modifies the original string, and 2) stores a pointer to the "current" location in the string internally.
If we didn't mind doing roughly the same, we could do something like this:
char *multi_tok(char *input, char *delimiter) {
static char *string;
if (input != NULL)
string = input;
if (string == NULL)
return string;
char *end = strstr(string, delimiter);
if (end == NULL) {
char *temp = string;
string = NULL;
return temp;
}
char *temp = string;
*end = '\0';
string = end + strlen(delimiter);
return temp;
}
This does work. For example:
int main() {
char input [] = "this is abc a big abc input string abc to split up";
char *token = multi_tok(input, "abc");
while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, "abc");
}
}
produces roughly the expected output:
this is
a big
input string
to split up
Nonetheless, it's clumsy, difficult to make thread-safe (you have to make its internal string
variable thread-local) and generally just a crappy design. Using (for one example) an interface something like strtok_r
, we can fix at least the thread-safety issue:
typedef char *multi_tok_t;
char *multi_tok(char *input, multi_tok_t *string, char *delimiter) {
if (input != NULL)
*string = input;
if (*string == NULL)
return *string;
char *end = strstr(*string, delimiter);
if (end == NULL) {
char *temp = *string;
*string = NULL;
return temp;
}
char *temp = *string;
*end = '\0';
*string = end + strlen(delimiter);
return temp;
}
multi_tok_t init() { return NULL; }
int main() {
multi_tok_t s=init();
char input [] = "this is abc a big abc input string abc to split up";
char *token = multi_tok(input, &s, "abc");
while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, &s, "abc");
}
}
I guess I'll leave it at that for now though--to get a really clean interface, we really want to reinvent something like coroutines, and that's probably a bit much to post here.
You can easlity write your own parser using strstr() to achieve the same. The basic algorithm may look like this
strstr()
to find the first occurrence of the whole delimiter string