We want to change $\beta$ so that the prediction is closer to data $y$, i.e., we require the change of $\beta$ decreases $X\beta = y - r$. So the change should be $\propto X^T r $.

Why this works? It reduces the MSE.

LAR is similar to lasso.

Modified LAR Algorithm 3.2a leads to lasso result.

LAR(lasso) is efficient. It takes $\mathrm{min}(p,N-1)$ steps where lasso itself might take more than p steps.

LAR and lasso are almost identical if we use the geometric meaning of the algorithms. But when some coefficient crosses 0, the differences pop up.