2015年8月16日 星期日

8.16 Coursera - Multiple R-squared and Adjusted R-squared

This is part of the discussion in the forum of regression model. I will paste the response here in case later the class is finished and the forum will be closed.

Michele's response:

Multiple R-squared  (sometimes just called R-squared) *always* will go up when you add more variables! The model with the added variables will better match the training data. The multiple R-squared is a measure of how good the fit is on the training data. But probably your training data has some spurious signals and coincidences that are not true features of the underlying population of interest.

Since multiple R-squared always gets higher when you include a new variable you cannot use it to decide if including that new variable is a good idea. If the new variable actually isn't related to your outcome you are just adding noise and "overfitting" your model.

The adjusted R-squared attempts to compensate for this by adding a "penalty" for including the extra variable. If the adjusted R-squared goes down when you include a new variable that suggests that it is not valuable to keep that variable in your model.

I am not a statistics expert so I will not try to explain how the adjusted R-square accomplishes this magic. My explanation would probably be not wholly correct.

沒有留言:

張貼留言