Difference-in-Difference Estimation: Garbage Incinerators and Home Prices
What is the Difference-in-Difference Estimation?
- A linear regression that is used in policy analysis when there exist a treatment and a control group and two time periods before and after.
- A more accurate way of verifying that the average differences between treatment and control groups across time are really meaningful.
- It is a way of eliminating unobserved heterogeneity, in other words it is a way of eliminating fixed factors that might have an impact between treatment and control groups.
The regression for the average difference between Massachusetts home prices before and after the incinerator demonstrate that the incinerator didn’t impact the prices of homes in any significant way. The reason that home prices are lower is probably not because the incinerator was build, but because home prices were lower that maybe the incinerator was built.
Figure 2: This figure shows the difference -in- difference estimation for the treatment group post policy. In other words it shows the average treatment effect of home prices near the incinerators post-policy.
The interpretation of these coefficients are a little tricky, but one thing to keep in mind is that the numbers are in natural logarithmic form since it is better to get figures in percentages.
Coefficient Explanation-All coefficients are in natural logs and have been converted with the natural number e
y81-The change in the average price of homes between 1978 and 1981 that are away from the incinerator
nearinc- The effect of being near the incinerator in 1978.
y81nrinc-Difference in price from being near the incinerator in 1981 compared to 1978
_cons-Value of house in 1978 that is far from the incinerator, in natural logarithm, to convert to regular price =exp(11.28)= $79,221
nearinc- The effect of being near the incinerator in 1978.
y81nrinc-Difference in price from being near the incinerator in 1981 compared to 1978
_cons-Value of house in 1978 that is far from the incinerator, in natural logarithm, to convert to regular price =exp(11.28)= $79,221
- The coefficient that we are interested in is the one y81nearinc coefficient of – 6.26% with a p-value of 45.3 percent under the hypothesis that y81nearinc is statistically insignificant from zero.
- One can cannot reject the hypothesis that living near the newly build incinerator did not cause a decrease in home prices.
- There appears to be other factors that are much more important in determining home prices than the presence of an incinerator.
The following regression shows how other factors are much more significant in determining the change in home prices than whether or not there is a new garbage incinerator near by.
Figure 3: After controlling for other factors that are important in determining home prices y8nrinc is still statistically insignificant.
Interpretation of Regression Coefficient Changes and Control Variables:
- y81 – the time trend in home prices for the control group is much less pronounced 14% increase as opposed to a 19% increase when you don’t control for relevant variables.
- y81nrinc – is slightly larger and still insignificant at the 10% level, indicating that the new incinerator probably didn’t have ANY affect on home prices in a 3 mile radius.
- nearinc – goes from being highly statistically significant to becoming statistically insignificant and much smaller in this new regression.
- Significant Control Variables- (bath) an additional bathroom adds 12% to a homes value, (area) an extra 100 feet in area adds about 1% to a persons home price, and (age) every 10 years of aging reduces a homes value by about 8.2%.
6 Comments on “Difference-in-Difference Estimation: Garbage Incinerators and Home Prices ”
- This is a good and accessible blog. The explanations are clear and suitable for both undergrads and graduate students. I have a question which i hope you will be able to help me with.
I am graduate student at the university of Reading in England. I am carrying out a policy evaluation and need help with running the regression for the difference in differences estimator/ effect of the treatment. I will try as best as possible to narrate the policy, what i am doing, and what i am struggling to understand.The policy, lets call it X, which is the treatment was enacted in 2001 and the outcome Y is a binary variable , so i have decided to look at the effect of X by comparing the effect on the ‘treated- those affected by X’ and the ‘control- those not affected by X’ . In addition, i have controlled for differences between the control and treatment over time by controlling for fixed effects such as sex, earnings, marital status, job etc between the control and the treated. my research is two dimensional. First, i am looking at the pre policy period of 2000 and then comparing it to a post policy reform period of 2002, bearing in mind that the ‘treatment’ was enacted in 2001. Secondly, i am considering pooling together the observations in 2000 and 2001, treating this as the pre policy period, identify the treatment and control for this period, and then compare this to the post policy period of 2002 and 2003, treating this as the post policy period.
so my question is how do i carry out this two aspects of the analysis in a regression?
What i have done so far is reg Y on set of variables and a time dummy ( in the case of the first analysis t=0 if 2000 and 1 if 2002), on the another dummy variable =1 for the treatd and 0 for the control, this gives the difference between the control and treated over time) and then an interaction of t=1 and treated=1 which i generated by generating another variable by multiplying the treatment which equal 1 by the time period which equal 1 as well ) for the estimator coefficient, i.e treatment effect) however, when i tried to do this in Stata, the interaction term was dropped due to collinearity. As such, i am unable to get the difference estimator( the coefficient on the interaction ) by carrying out the regression this way. Is there a way i can solve this problem?
And since i am using a binary dependent variable, can i still use OLS or should i use the probit non ols estimator ?
I appreciate any insights you can offer on this!- Hi Becky,Hope all is well with you. If you have multiple-years of data, you might want to use a fixed effects estimation that you can use on STATA, or a random effects model if you are familiar with them. This might eliminate the multicollinearity that is common when you are using alot of binary terms in a regresison.I think it would be quiet interesting to use a probit estimate since you have a binary dependent variable. I haven’t done such a thing when testing policy questions, but I am think it ould be done.Depending on how your data varies these non-linear aspect of this estimate might eliminate this multicollinearity problem.Last, but not least take a good look at the interaction term. You can export the data on Excel and then compare the values of the interaction term with both the treatment binary and the time binary variable. Do something like =a2=b2 and Excel will respond true if a2=b2, then sort to see if you have “True” for every row when you compare the treatment and time variables to the interaction term. If all are true there is your source of multicollinearity.Good luck and please let me know what you find out. You have sparked my curiosity.
No comments:
Post a Comment
Thank you