Save .dta files in python

谁说胖子不能爱 提交于 2020-01-20 17:53:06

问题


I'm wondering if anyone knows a Python package that allows you to save numpy arrays/recarrays in the .dta format of the statistical data analysis software Stata. This would really speed up a few steps in a system I have.


回答1:


pandas DataFrame objects now have a "to_stata" method. So you can do for instance

import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')

DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). This answer may look less elegant, but it is probably more efficient.




回答2:


The scikits.statsmodels package includes a reader for Stata data files, which relies in part on PyDTA as pointed out by @Sven. In particular, genfromdta() will return an ndarray, e.g. from Python 2.7/statsmodels 0.3.1:

>>> import scikits.statsmodels.api as sm
>>> arr = sm.iolib.genfromdta('/Applications/Stata12/auto.dta')
>>> type(arr)
<type 'numpy.ndarray'>

The savetxt() function can be used in turn to save an array as a text file, which can be imported in Stata. For example, we can export the above as

>>> sm.iolib.savetxt('auto.txt', arr, fmt='%2s', delimiter=",")

and read it in Stata without a dictionary file as follows:

. insheet using auto.txt, clear

I believe a *.dta reader should be added in the near future.




回答3:


The only Python library for STATA interoperability I could find merely provides read-only access to .dta files. The R foreign library however provides a function write.dta, and RPy provides a Python interface to R. Maybe the combination of these tools can help you.



来源:https://stackoverflow.com/questions/7503487/save-dta-files-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!