How to do tokenization from a predifined vocab in tensorflow or pytorch or keras?

后端未结

关注

 0  593

I have a predefined vocab which build from the common-used 3500 Chinese characters. Now I want to tokenize the Dataset with this vocab to fix each character. Any mature