how to remove heteroscedasticity in r
NCV Test
car::ncvTest(lmMod) # Breusch-Pagan test Non-constant Variance Score Test Variance formula: ~ fitted.values Chisquare = 4.650233 Df = 1 p = 0.03104933
p-value less that a significance level of 0.05, therefore we can reject the null hypothesis that the variance of the residuals is constant and infer that heteroscedasticity is indeed present, thereby confirming our graphical inference.
treatment for multicollinearity
Box-Cox transformation
Box-cox transformation is a mathematical transformation of the variable to make it approximate to a normal distribution. Often, doing a box-cox transformation of the Y variable solves the issue, which is exactly what I am going to do now.
library("caret", lib.loc="~/R/win-library/3.2")
> distBCMod=BoxCoxTrans(r$Crime)
> distBCMod
Box-Cox Transformation
47 data points used to estimate Lambda
Input data summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
342.0 658.5 831.0 905.1 1058.0 1993.0
Largest/Smallest: 5.83
Sample Skewness: 1.05
Estimated Lambda: -0.1
With fudge factor, Lambda = 0 will be used for transformations
> r <- cbind(r, Crime_new=predict(distBCMod, r$Crime)) # append the transformed variable to r
> head(r) # view the top 6 rows
Crime Crime_new
1 791 6.673298
2 1635 7.399398
3 578 6.359574
4 1969 7.585281
5 1234 7.118016
6 682 6.525030
> lmMod_bc <- lm(Crime_new ~ Wealth+Ineq, data=r)
>
> ncvTest(lmMod_bc)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 0.003153686 Df = 1 p = 0.9552162
> ncvTest(mod3)
With a p-value of 0.9552162, we fail to reject the null hypothesis (that variance of residuals is constant) and therefore infer that ther residuals are homoscedastic. Lets check this graphically as well.
plot(lmMod_bc)
No comments:
Post a Comment