Importing foreign languages from csv file to Stata

浪子不回头ぞ 提交于 2019-12-05 22:04:08

As @Nick Cox commented earlier, the problem is that Stata just doesn't support Unicode/UTF-8 encoding. No, StatTransfer wouldn't resolve the problem (please refer to this explanation).

You can do the trick using an online decoder or MS Word. Let's do it with one language first, say, Russian as in your screenshots. Check out the correct encodings for Croatian, Turkish, and other languages you have.

  1. Save the string variable from your .csv file as plain text (.txt), choosing the UTF-8 encoding option.
  2. Encoding conversion:
    • Use iconv, suggested by @Dimitriy V. Masterov, or
    • Use an online tool, such as this: upload .txt file, choose source encoding as UTF-8 and output encoding according to the language of interest (for Russian, it must be CP1251), click "convert" button and save the output file, or
    • If you have MS Office, you can use also MS Word for the same purpose. Right click on .txt file, choose "Open with...", choose to open with MS Word. In the appeared window, confirm that the file encoding is "Unicode (UTF-8)", open, then click "Save as...", save as plain text. In the newly appeared window, choose "Cyrillic (Windows)" and mark "Insert line breaks". Save.
  3. Check out your new .txt file - it still should have some strange characters (like ÌßÑÎÊÎÌÁÈÍÀÒ) but now Stata can display them properly.
  4. Copy-paste the new string variable in Stata Data Editor, right click on the variable, choose "Font...", and then string "Cyrillic". You should see correct names on the screen both in data editor and in the results window (even though the string itself is intact).

Depending on your OS, you might need to install all appropriate languages first.
Hope it helps.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!