- Why? Some variables might be redundant. Shrink the model.
- Small constraint $t$ cause some of the coefficients reduce exactly to 0: this is variable selection, while producing sparse model.
- Convex optimization.
Why would lasso leads to exact 0 coefficients?
Would spot the reason as long as you plot out the constraints and the RSS. Fig. 3.11.
Compare Lasso and Ridge
For sparse models, lasso is better. Otherwise, lasso can make the fitting worse than ridge.
No rule of thumb.
Ridge and lasso can be generalized. Replace the distance calculation with other definitions, i.e., $\sum \lvert \beta_j \rvert^q$.
- $q=0$: subset selection
- $q=1$: lasso
- $q=2$: ridge
Smaller $q$ leads to tighter selection.comments powered by Disqus