(From lecture notes of Regression Modeling) The variance estimate is biased if we underfit the model. The variance estimate is unbiased if we correctly or overfit the model, including all necessary covariates and/or unnecessary covariates. When including unnecessary variables (i.e. overfit), the variance of the variance is larger.
(From lecture notes of Practical Machine Learning) Overfitting will leads to the out-of-sample error rate higher than the in-sample error rate.
Data have two parts: signal and noise. The goal of a predictor is to find signal, and a perfect in-sample predictor can always be designed, but predictor won't perform as well on new samples.
Overfitting will do worse in extrapolating the missing data. Can fit perfectly within the model, but will generalize poorly, not work as robust when going to the outside world.
Adding extra variable is equal to constraining in a new dimension.
沒有留言:
張貼留言