Profile picture
Kishan Chakraborty
Research Scholar
Home
Blog
Back

Lagrange Multipliers

Consider the unconstrained optimization problem

min⁡x∈Rn  f(x)\min_{x \in \mathbb{R}^n} ~~ f(x)x∈Rnmin​  f(x)

The first criteria for x∗x^*x∗ to be a local minima is that the gradient ∇f(x∗)\nabla f(x^*)∇f(x∗) that represents the direction of steepest ascent, becomes zero. This means, in an unrestricted space (absence of constraints) where the point xxx can move in any direction in Rn\mathbb{R}^nRn in order to minimize the function, has no where to go. How to interpret the same thing in a constrained optimization problem? In this article I will try to give an intuition for this problem with the help of widely used Lagrange multipliers.

Consider the following optimization problem with equality constraint

min⁡x∈Rn  f(x)s.t.     hi(x)=0  i=1,2,⋯ ,p\begin{aligned} &\min_{x \in \mathbb{R}^n} ~~ f(x)\\ &s.t. ~~~~~ h_i(x) = 0 ~~ i = 1, 2, \cdots, p \end{aligned}​x∈Rnmin​  f(x)s.t.     hi​(x)=0  i=1,2,⋯,p​

Now, the decision set is restricted. The point x can not move freely in Rn\mathbb{R}^nRn.

1. One Equality Constraint

For simplicity, let us only consider an example with only one constraint hhh.

min⁡x∈Rn  x2+y2+xys.t.     x+y−1=0\begin{aligned} &\min_{x \in \mathbb{R}^n} ~~ x^2 + y^2 + xy\\ &s.t. ~~~~~ x+y-1 = 0 \end{aligned}​x∈Rnmin​  x2+y2+xys.t.     x+y−1=0​

The solution to this problem attains at A =(12,12) = (\frac{1}{2}, \frac{1}{2})=(21​,21​)(DIY) as shown in the Figure 1.

Figure 1: Contour visualization

One important thing to note in the above figure, at optimal point (A), the constraint line is a tangent to the level curve. If the constraint was non-linear, at the point of optima, the both fff and ggg will have a common tangent. The intuition is that at a different feasible point say C, where the level curve intersects the constraint, if the function fff moves towards A, it can still reduce its value. Therefore, C can not be a optimal point.

This fact becomes more clear when we look at the normal (ray AB) to both the curves at the point A. The normal to the level curve and the constranints are parallel i.e.,∇xf(A)=μ∇xg(A)i.e., \nabla_{x}f(A) = \mu \nabla_{x}g(A)i.e.,∇x​f(A)=μ∇x​g(A). Notice, that the gradient of fff at A is not zero but the direction of steepest ascent (take negative and it becomes descent) is parallel to the normal to the gradient. Therefore, any step toward that direction will lead to constraint violation. Based on the equation given in the above equation, we define the lagrangian function as follows

L(x,μ)≐f(x)+μg L(x, \mu) \doteq f(x) + \mu gL(x,μ)≐f(x)+μg

where μ\muμ is a called a Lagrange multiplier. In L,μL, \muL,μ acts as a penalty term for violating ggg. We wish to stay on the constraint set ggg and μ\muμ acts as a pushback, ensuring that any movement away from ggg becomes undesirable in L. We solve the Lagrangian wrtx,μwrt x, \muwrtx,μ, and the resultant x∗x^*x∗ is also optimal point of the original problem.

2. Many Equality Constraints

what happens when there are two equality constraint g1,g2g_1, g_2g1​,g2​ in the problem? A point of optima need to satisfy both the constraints. Like the one constraint case, does the level curve and two constraints share the same tangent at the point of optima? What about the normal to the level curve at an optimal point? Let us take an example by adding an new constraint g2:x2+y2=4g_2: x^2 + y^2 = 4g2​:x2+y2=4 to our previous problem. The solution attains at [1.82,−0.82]T[1.82, -0.82]^T[1.82,−0.82]T as shown in the figure below.

Figure 1: Contour visualization

One can see that the gradient of the objective function is not zero at AAA. Also, the tangent to the level curve, g1g_1g1​ and g2g_2g2​ are not parallel anymore. The observations about the normal to the level curve is not the same but can be extended with increasing number of constraints.

The idea was that at the point of optima, the gradient should not be able to take any local step along a vector that can minimize further. Such a vector if exists (non zero) should be perpendicular to the constraints so that any step along that direction takes fff out of feasible region. Therefore, gradient is spanned by the normals to the constraints, i.e. ∇xf(A)=μ1∇xg1(A)+μ2∇xg2(A)\nabla_{x}f(A) = \mu_1 \nabla_{x}g_1(A) + \mu_2 \nabla_{x}g_2(A)∇x​f(A)=μ1​∇x​g1​(A)+μ2​∇x​g2​(A).

Therefore, extending similar argument, the Lagrangian for ppp equality constraints problem can be written as

L(x,μ)≐f(A)+∑i=1pμigi L(x, \mu) \doteq f(A) + \sum_{i=1}^p \mu_i g_iL(x,μ)≐f(A)+i=1∑p​μi​gi​

3. Inequality Constraints

Inequality constraints gi(x)≤0g_i(x) \le 0gi​(x)≤0 is called inactive xxx if gi(x)<0g_i(x) < 0gi​(x)<0. This is called inactive because it allows local movement along all possible direction around xxx without violating the constraint. Since, it allows local movement along all possible direction, it does not influence the optimality of the function. Therefore, while solving a optimization problem, inactive constraints are ‘ignored’. When gi(x)=0g_i(x) = 0gi​(x)=0, it mostly behave like a equality constraint with a condition that the corresponding Lagrange multiplier λi\lambda_iλi​ is non-negative. This is because, at an optimal point x∗,g(x∗)=0x^*, g(x^*) = 0x∗,g(x∗)=0 and from the Lagrangian

∇xf(x∗)+λ∇g(x∗)=0⇒∇xf(x∗)=−λ∇g(x∗)\begin{aligned} \nabla_x f(x^*) + \lambda \nabla g(x^*) &= 0 \\ \Rightarrow \nabla_x f(x^*) &= -\lambda \nabla g(x^*) \end{aligned}∇x​f(x∗)+λ∇g(x∗)⇒∇x​f(x∗)​=0=−λ∇g(x∗)​

∇g\nabla g∇g points outside the feasible region, therefore, if λ<0\lambda < 0λ<0 it encourages fff to violate the constraint, which is not a expected behaviour. The function of a constraint is to push back the function not pull outside the feasible region. For details on the role of Lagrangian multipliers in optimization of an NLP (Non-linear Programming) see the KKT condition (Write a new article on KKT conditions).

4. Interpreting Lagrange Multipliers

Lagrange multipliers are interpreted as an amount force the constraint applies at the optimum. It tells about the sensitivity of a constraint. How much does the objective function change with a unit change in the corresponding constraint. To form a mental model, if a constraint is seen as a blockade, the corresponding Lagrange multiplier tells how strongly the constraint is blocking the objective function. It also represents the importance of a constraint in shaping the gradient vector ∇f\nabla f∇f.