<aside> 💡 the L1 norm is often used as a regularization term in the objective function to encourage sparsity in the model's parameter vector. Sparsity here means that many of the components of the parameter vector are zero, which can lead to simpler and more interpretable models.

Let's focus on L1 regularization, as it leads to sparsity. The absolute value function has a sharp point at zero. So, when minimizing the loss function, it's often "cheaper" (in terms of keeping the loss function small) for the model to drive some coefficients to exactly zero rather than having many coefficients with small, non-zero values. In other words, it's better to ignore some features completely (coefficients = 0) and put more emphasis on other, more informative features (coefficients ≠ 0).

This is how L1 regularization leads to sparsity: it encourages models to use fewer features. Here's a simple example:

</aside>