I\'m looking for a regular expression that can correctly match valid pinyin (e.g. \"sheng\", \"sou\" (while ignoring invalid pinyin, e.g. \"shong\", \"sei\"). Most of the re
I would use a combination approach that is not solely regex.
Check for valid pinyin:
grab word
grab letters from the beginning of the word as long as they are consonants. This separates the initial sound from the final sound.
check that the initial and final are valid...
...and if so, see if their combination is allowed (via a table like this, but the entries are simply 1's and 0's).