I haven't yet tried this, but I need to do the same thing. My thinking is to first split your speech text into an array of words.
Then create a recursive function that plays the next word after the current word is finished, while keeping a counter of the current word.