This is part of the discussion in the forum of regression model. I will paste the response here in case later the class is finished and the forum will be closed.
Michele's response:
Multiple R-squared (sometimes just called R-squared) *always* will go
up when you add more variables! The model with the added variables will
better match the training data. The multiple R-squared is a measure of
how good the fit is on the training data. But probably your training
data has some spurious signals and coincidences that are not true
features of the underlying population of interest.
Since
multiple R-squared always gets higher when you include a new variable
you cannot use it to decide if including that new variable is a good
idea. If the new variable actually isn't related to your outcome you are
just adding noise and "overfitting" your model.
The adjusted
R-squared attempts to compensate for this by adding a "penalty" for
including the extra variable. If the adjusted R-squared goes down when
you include a new variable that suggests that it is not valuable to keep
that variable in your model.
I am not a statistics expert so I
will not try to explain how the adjusted R-square accomplishes this
magic. My explanation would probably be not wholly correct.
沒有留言:
張貼留言