R-Squared might not should be that go-to value we use to make decisions based on.
R-squared says nothing about prediction error, even with σ2 exactly the same, and no change in the coefficients. R-squared can be anywhere between 0 and 1 just by changing the range of X. 1
As you can see below, by simply including more values - the R-Squared can be drastically increased (or decreased); which, is a concern because someone preparing summary statistics of data could manipulate the number of observations to benefit them. All this can be done while other, more reliable predictions, such as mean square error, stay the same (given the regression is the same) no matter the number of observations.
Setting Up The Regression
R-Squared
summary(RS10)$r.squared
[1] 0.9383379
Mean Squared Error
Setting Up The Regression
R-Squared
summary(RS2)$r.squared
[1] 0.1502448
Mean Squared Error
As you can see, just by changing the number of observations alone makes R-Squared an unreliable prediction measure. However, you can also see that factors as simple as number of observations does not effect the mean squared error. For that reason, I suggest taking some TNT to R-Squared and using the advanced machine learning model options available for free in the R program to get predictions that’d blow R-Squared out of the water! Browse my ‘Projects’ tab to see more on some of these models such as Support Vector Machine and Neural Networks; though, I think good ’ole Google might be a better resource than I.