Primal problem

\begin{aligned} \min f(x) \\ s.t\quad g(x) \leq b \\ h(x)=d \\ x \in X \end{aligned}

Lagrange problem

\begin{aligned} \min f(x)+\lambda(g(x)-b)+\mu(h(x)-d) \\ s.t \quad \lambda \geq 0 \\ \mu\,free \\ x \in X \end{aligned}

As we know

\begin{gathered} \min_{x}\{f(x)+\lambda(g(x)-b)+\mu(h(x)-d)|g(x) \leq b,h(x) = d,x \in X\} \\ \leq \\ \min_{x}\{f(x)|g(x) \leq b,h(x) = d,x \in X\} \end{gathered}

According to the optimization principle

\begin{gathered} \min_{x}\{f(x)+\lambda(g(x)-b)+\mu(h(x)-d)|g(x) \leq b,h(x) = d,x \in X\} \\ \geq \\ \min_{x}\{f(x)+\lambda(g(x)-b)+\mu(h(x)-d)|x \in X\} \end{gathered}

Therefore

\begin{gathered} \min_{x}\{f(x)+\lambda(g(x)-b)+\mu(h(x)-d)|x \in X\} \end{gathered}

is a lower bound of primal problem, we hope to maximize this lower bound

For specific $\lambda$ and $\mu$ , we can get an optimal $x$
For this $x$ $\in$ $X$ , we have the same lower bound

\begin{gathered} L(\lambda, \mu) = (g(x)-b)\lambda+(h(x)-d)\mu+f(x) \end{gathered}

It is a non-differentiable concave functions as follow:

It is impossible for us to enumerate all $x$ $\in$ $X$ , so we can not figure out the overall view of $L(\lambda,\mu)$ , to find the "peak", subgradient is a good method to improve the lower bound

Then we update $\lambda$ and $\mu$ , and re-optimize (5) to update $x$

Loop until reach algorithm end criteria

❗It is crucial to select accurate step size(learning rate), bad step size could lead to divergence