What variables to include in fixed effect model (Panel data)

微笑、不失礼 提交于 2021-01-07 02:38:05

问题


I am doing a fixed effect model to research on support’s effect on reducing number of injured employees. I have a dataset on company level from 2012-2020:

year average age total salary total number of employees Segment Industry Risk Index support total number of Injured employees
company A ID 2012 45 5 Million 55 S IT 1 0 1
company B ID 2012 48 40M 500 B Service 3 0 20
Data clarification
- Industry, Segment, Risk index, support are set as factor
- support: (0: not received support on year 2014, 1: support received on year 2014)
*support is set to 0 for all entries <=2014, so this variable will not be dropped from the fixed effect model
- Dependent variable: total number of injured employees

Since the goal is to research the impact of support on number of injured employees after the support has been received, I have divided the data into 5 subsets:

  • support impact for year 2016-2017=> with data from year: 2012, 2013, 2014, 2016, 2017
  • support impact for year 2017-2018=> with data from year: 2012, 2013, 2014, 2017, 2018
  • support impact for year 2018-2019=> with data from year: 2012, 2013, 2014, 2018, 2019
  • support impact for year 2019-2020=> with data from year: 2012, 2013, 2014, 2019, 2020

For each subset, I have done Fixed Effect models from 2 aspects:

  1. Fixed effect (Within industry, with time effect) plm(number_of_injured_employees~salary+employee_number+avg_age+segment+industry+index+support, index=c('industry','year'),model='within', effect = "twoways", data=data)

  2. Fixed effect (Within customer, with time effect) for each risk index

plm(number_of_injured_employees~salary+employee_number+ support+avg_age, index=c('Customer','year'),model='within', effect = "twoways", data=data_index1)

plm(number_of_injured_employees~salary+employee_number+ support+avg_age, index=c('Customer','year'),model='within', effect = "twoways", data=data_index2) …

I would like to ask:

  • Can I interpret the result from within industry to be the general impact of support (since within industry, within segment, within index give the same result), and within customer to be the support impact on customer level? Or what is the correct way to interpret the model?

  • Is it ok to divide the data into 5 subsets, so I can investigate the impact for different years after receiving the support on 2014?

  • I got an alternate opinion: I should make the model with only support as independent variable, as we are investigating cause effect, and the dependent variable should be modified into: number_of_injured_employees/total_number_of_employees

    Fixed effect (Within customer, without time effect) plm(number_of_injured_employees/total_number_of_employees~support, index='customer' model='within', data=data)

  • I still think that salary, total number of employees, avg_age etc. should be included in the model, as they are related to the dependent variable => which will affect the coefficient of support. And the dependent variable should just be the number_of_injured_employees, as I have already included employee_number in my independent variable, and also by simply dividing the employee_number, could make the DV very strange—customer with a lot of employees will end up with a very small number.

Is my thinking correct?

来源:https://stackoverflow.com/questions/65528497/what-variables-to-include-in-fixed-effect-model-panel-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!