I am currently having a problem with some texts in my dataset, where it has Chinese and number mixed strings, and cannot be easily transferred and handled for later data pro