Convert .CSV files to .DTA files in Python

依然范特西╮ 提交于 2020-01-11 11:58:12

问题


I'm looking to automate the process of converting many .CSV files into .DTA files via Python. .DTA files is the filetype that is handled by the Stata Statistics language.

I have not been able to find a way to go about doing this, however.

The R language has write(.dta) which allows a dataFrame in R to be converted to a .dta file, and there is a port to the R language from Python via RPy, but I can't figure out how to use RPy to access the write(.dta) function in R.

Any ideas?


回答1:


You need rpy2 for Python and also the foreign package installed in R. You do that by starting R and typing install.packages("foreign"). You can then quit R and go back to Python.

Then this:

import rpy2.robjects as robjects
robjects.r("require(foreign)")
robjects.r('x=read.csv("test.csv")')
robjects.r('write.dta(x,"test.dta")')

You can construct the string passed to robjects.r from Python variables if you want, something like:

robjects.r('x=read.csv("%s")' % fileName)



回答2:


(copypasting from my answer to a previous question)

pandas DataFrame objects now have a "to_stata" method. So you can do for instance

import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')

DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). Spacedman's answer may look less elegant, but it is probably more efficient.



来源:https://stackoverflow.com/questions/19295832/convert-csv-files-to-dta-files-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!