Python Panel Data

妖精的绣舞 提交于 2019-12-13 00:34:10

问题


I am usually using Stata but now want to use Python and desperately trying to create a pandel data set. I tried pandas.panel but do not get it to work. I have the following dataset:

  date  id1   id2
  2000  100   50
  2001  101   48

Now I want to make it look like this:

    date  id   variable
    2000   1    100
    2000   2    101
    2001   1    50
    2001   2    48

Next, I want to identify a time and id variable to run some panel function. I also tried dataframe.stack(), but this doesn't sort according to the id. How do I do this or am I missing some nice time-series function in pandas here?

Sorry for the question. I am sure this has been answered somewhere, but I tried several hours now and cannot figure it out.


回答1:


Given input data:

data = [
    {"date": 2000, "id1": 100, "id2": 50},
    {"date": 2001, "id1": 101, "id2": 48}
]

or

data = {
    "date": [2000, 2001],
    "id1": [100, 101],
    "id2": [50, 48],
}

such that

df = pd.DataFrame(data)
df

"melt" the pandas DataFrame:

melted = pd.melt(df, id_vars="date", var_name="id", value_name="variable")

# Optional amendments
melted["id"] = melted["id"].str.replace("id", "")
melted.sort_values(by="date", inplace=True)
melted.reset_index(inplace=True, drop=True)

melted

melted Output

Additional Reference: Wickham, H. Tidy Data, The Journal of Statistical Software, 10, 59, 2014.



来源:https://stackoverflow.com/questions/43848730/python-panel-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!