If I have a string variable who\'s value is \"john is 17 years old\" how do I tokenize this using spaces as the delimeter? Would I use awk?
with POSIX extended regex:
$ str='a b c d'
$ echo "$str" | sed -E 's/\W+/\n/g' | hexdump -C
00000000 61 0a 62 0a 63 0a 64 0a |a.b.c.d.|
00000008
this is like python's re.split(r'\W+', str)
\W matches a non-word character,
including space, tab, newline, return, [like the bash for tokenizer]
but also including symbols like quotes, brackets, signs, ...
... except the underscore sign _,
so snake_case is one word, but kebab-case are two words.
leading and trailing space will create an empty line.