Importing foreign languages from csv file to Stata

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-22 10:15:51

问题


I am using Stata 12. I have encountered the following problems. I am importing a bunch of .csv files to Stata using the insheet command. The datasets may conclude Russian, Croatian, Turkish, etc. I think they are encoded in "UTF-8". In .csv files, they are correct. After I imported them into Stata, the original strings are incorrect and become the strange characters. Would you please help me with that? Does Stat-Transfer can solve the problems? Does it support .csv format?

For example, the original file is like:

My code is like: insheet using name.csv, c n save name.dta,replace

The result is like:

And I have tried to adjust the script in the fonts option, which does not work.


回答1:


As @Nick Cox commented earlier, the problem is that Stata just doesn't support Unicode/UTF-8 encoding. No, StatTransfer wouldn't resolve the problem (please refer to this explanation).

You can do the trick using an online decoder or MS Word. Let's do it with one language first, say, Russian as in your screenshots. Check out the correct encodings for Croatian, Turkish, and other languages you have.

  1. Save the string variable from your .csv file as plain text (.txt), choosing the UTF-8 encoding option.
  2. Encoding conversion:
    • Use iconv, suggested by @Dimitriy V. Masterov, or
    • Use an online tool, such as this: upload .txt file, choose source encoding as UTF-8 and output encoding according to the language of interest (for Russian, it must be CP1251), click "convert" button and save the output file, or
    • If you have MS Office, you can use also MS Word for the same purpose. Right click on .txt file, choose "Open with...", choose to open with MS Word. In the appeared window, confirm that the file encoding is "Unicode (UTF-8)", open, then click "Save as...", save as plain text. In the newly appeared window, choose "Cyrillic (Windows)" and mark "Insert line breaks". Save.
  3. Check out your new .txt file - it still should have some strange characters (like ÌßÑÎÊÎÌÁÈÍÀÒ) but now Stata can display them properly.
  4. Copy-paste the new string variable in Stata Data Editor, right click on the variable, choose "Font...", and then string "Cyrillic". You should see correct names on the screen both in data editor and in the results window (even though the string itself is intact).

Depending on your OS, you might need to install all appropriate languages first.
Hope it helps.



来源:https://stackoverflow.com/questions/19231311/importing-foreign-languages-from-csv-file-to-stata

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!