Sunday, 5 February 2017

how to remove heteroscedasticity in r

how to remove heteroscedasticity in r

NCV Test
car::ncvTest(lmMod)  # Breusch-Pagan test
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 4.650233    Df = 1     p = 0.03104933 
p-value less that a significance level of 0.05, therefore we can reject the null hypothesis that the variance of the residuals is constant and infer that heteroscedasticity is indeed present, thereby confirming our graphical inference.

treatment for multicollinearity

Box-Cox transformation

Box-cox transformation is a mathematical transformation of the variable to make it approximate to a normal distribution. Often, doing a box-cox transformation of the Y variable solves the issue, which is exactly what I am going to do now.
 library("caret", lib.loc="~/R/win-library/3.2")

> distBCMod=BoxCoxTrans(r$Crime)
> distBCMod
Box-Cox Transformation

47 data points used to estimate Lambda

Input data summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  342.0   658.5   831.0   905.1  1058.0  1993.0 

Largest/Smallest: 5.83 
Sample Skewness: 1.05 

Estimated Lambda: -0.1 
With fudge factor, Lambda = 0 will be used for transformations

> r <- cbind(r, Crime_new=predict(distBCMod, r$Crime)) # append the transformed variable to r
> head(r) # view the top 6 rows
 Crime Crime_new
1   791  6.673298
2  1635  7.399398
3   578  6.359574
4  1969  7.585281
5  1234  7.118016
6   682  6.525030
> lmMod_bc <- lm(Crime_new ~ Wealth+Ineq, data=r)
> ncvTest(lmMod_bc)
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 0.003153686    Df = 1     p = 0.9552162 
> ncvTest(mod3)

With a p-value of  0.9552162, we fail to reject the null hypothesis (that variance of residuals is constant) and therefore infer that ther residuals are homoscedastic. Lets check this graphically as well.
plot(lmMod_bc)
Here it is the plot:
Transformed-model-no-heteroscedasticity

No comments:

Post a Comment