Why is L1 norm harder to optimize than L2 norm?

  • Thread starter pamparana
  • Start date
  • Tags
    L2 Norm
In summary, the conversation is about the differences between optimizing using L1 and L2 norms. L2 norm is easier to optimize because it has a closed form solution and a derivative everywhere. L1 norm has a derivative everywhere except 0, which makes it more challenging for optimization. However, L1 norm is convex and continuous and only has one point without a derivative, making it a valid option for optimization. The conversation also mentions that L2 norm is a sequence of linear programs, while L1 norm is a single program in linear optimization problems.
  • #1
pamparana
128
0
Hi all,

I have a basic optimisation question. I keep reading that L2 norm is easier to optimise than L1 norm. I can see why L2 norm is easy as it will have a closed form solution as it has a derivative everywhere.

For the L1 norm, there is derivatiev everywhere except 0, right? Why is this such a problem with optimisation. I mean, there is a valid gradient everywhere else.

I am really having problems convincing myself why L1 norm is so much harder than l2 norm minimisation. L1 is convex and continupus as well and only has one point which does not have a derivative.

Any explanation would be greatly appreciated!

Thanks,

Luca
 
Physics news on Phys.org
  • #2
If the problem is a linear program then the L1 norm is a single program while the L2 norm is a sequence of linear programs that finds the efficient frontier.
 

Related to Why is L1 norm harder to optimize than L2 norm?

1. What are L1 and L2 norm minimisation?

L1 and L2 norm minimisation are techniques used in machine learning and optimization to reduce the complexity of a model by penalizing large coefficients or features. They are also known as Lasso and Ridge regression, respectively.

2. What is the difference between L1 and L2 norm minimisation?

The main difference between L1 and L2 norm minimisation is the penalty term used. L1 norm minimisation uses the absolute values of the coefficients, while L2 norm minimisation uses the squared values of the coefficients. This leads to different regularization effects and can result in different models.

3. When should I use L1 or L2 norm minimisation?

L1 norm minimisation is useful when the data has many irrelevant features, as it can shrink the coefficients of these features to zero, effectively removing them from the model. L2 norm minimisation is more suitable when the data has correlated features, as it can reduce the impact of these features on the model.

4. How do I choose the optimal penalty term for L1 or L2 norm minimisation?

The optimal penalty term for L1 or L2 norm minimisation can be chosen through techniques such as cross-validation or grid search. These methods test different penalty terms and select the one that results in the best performance on a validation dataset.

5. Can L1 or L2 norm minimisation be used for any type of model?

L1 and L2 norm minimisation can be used for linear models such as linear regression and logistic regression. However, they can also be applied to non-linear models by transforming the features into a higher-dimensional space, making them suitable for a wider range of models.

Similar threads

Replies
1
Views
777
  • Calculus and Beyond Homework Help
Replies
1
Views
579
  • Precalculus Mathematics Homework Help
Replies
9
Views
824
Replies
6
Views
6K
  • Linear and Abstract Algebra
Replies
7
Views
832
Replies
1
Views
867
  • Calculus and Beyond Homework Help
Replies
3
Views
27K
Replies
3
Views
3K
Replies
2
Views
2K
  • Calculus and Beyond Homework Help
Replies
1
Views
3K
Back
Top