Convex and Non-Convex Optimisation
Mathematical Background
A vector norm on $\mathbb{R}^n$ is a function $\|\cdot\|$ from $\mathbb{R}^n$ to $\mathbb{R}$ such that:
- $\|\mathbf{x}\| \ge 0,\, \forall \mathbf{x} \in \mathbb{R}^n$ and $\|\mathbf{x}\| = 0 \iff \mathbf{x} = \mathbf{0}$
- $\|\mathbf{x}+\mathbf{y}\| \le \|\mathbf{x}\|+\|\mathbf{y}\|$ $\forall \mathbf{x},\mathbf{y} \in \mathbb{R}^n$ (Triangle Inequality)
- $\|\alpha \mathbf{x}\| = |\alpha| \|\mathbf{x}\|$ $\forall \alpha \in \mathbb{R}, \mathbf{x} \in \mathbb{R}^n$
- $f \in C^1 (\mathbb{R}^n)$ implies each $\frac{\partial f}{\partial x_i}$ exists, and $\nabla f(x)$ is continuous on $\mathbb{R}^n$
- $f \in C^2 (\mathbb{R}^n)$ implies each $\frac{\partial^2 f}{\partial x_i \partial x_j}$ exists, and $\nabla^2 f(x)$ forms a continuous Hessian matrix.
If $f_0 \in \mathbb{R}, \mathbf{g} \in \mathbb{R}^n, G \in \mathbb{R}^{n \times n}$:
- Linear: $f(\mathbf{x}) = \mathbf{g}^T \mathbf{x}$
- Affine: $f(\mathbf{x}) = \mathbf{g}^T \mathbf{x} + f_0$
- Quadratic: \[f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T G \mathbf{x} + \mathbf{g}^T \mathbf{x} + f_0\]
Let $A \in \mathbb{R}^{n \times n}$ be a symmetric matrix. Then:
- $A$ has $n$ real eigenvalues.
- There exists an orthogonal matrix $Q$ such that $A= Q D Q^\top$ where $D=\mathrm{diag}(\lambda_1,\dots,\lambda_n)$ and $Q=[v_1 \, \dots \, v_n]$ with $v_i$ an eigenvector of $A$ corresponding to eigenvalue $\lambda_i$.
- $\mathrm{det}(A)=\prod_{i=1}^n \lambda_i$ and \[\mathrm{tr}(A)=\sum_{i=1}^n \lambda_i=\sum_{i=1}^n A_{ii}\]
- $A$ is positive definite $\iff \lambda_i>0$ for all $i=1,\dots,n$.
- $A$ is positive semi-definite $\iff \lambda_i \geq 0$ for all $i=1,\dots,n$.
- $A$ is indefinite $\iff$ there exist $i,j$ with $\lambda_i>0$ and $\lambda_j<0$.
A symmetric matrix $A$ is positive definite if and only if all leading principal minors of A are positive. The $i$th principal minor $\Delta_i$ of A is the determinant of the leading $i \times i$ submatrix of A.
If $\Delta_i, i = 1, 2, \dots, n$ has the sign of $(-1)^i, i = 1, 2, \dots, n$, that is, the values of $\Delta_i$ are alternatively negative and positive, then $A$ is negative definite.
Note that PSD only applies if you check all principal minors, of which there are $2^n - 1$, as opposed to just checking $n$ submatrices here.
Convexity
Let $\Omega \subseteq \mathbb{R}^n$ be a convex set. $\overline{\mathbf{x}} \in \Omega$ is an extreme point of \[ \Omega \iff \mathbf{x}, \mathbf{y} \in \Omega, \, \theta \in [0,1] \] and \[ \overline{\mathbf{x}} = \theta \mathbf{x} + (1 - \theta) \mathbf{y} \implies \mathbf{x} = \mathbf{y} \].
In other words, a point is in an extreme point if it cannot be on a line segment in $\Omega$.
A function $f: \Omega \to \mathbb{R}$ is:
- convex if $f(\theta \mathbf{x} + (1-\theta)\mathbf{y}) \leq \theta f(\mathbf{x}) + (1-\theta)f(\mathbf{y})$ for all $\mathbf{x},\mathbf{y} \in \Omega$ and $\theta \in [0,1]$;
- strictly convex if strict inequality holds whenever $\mathbf{x} \neq \mathbf{y}$;
- concave if $-f$ is convex.
Let $\Omega$ be a convex set and let $f: \Omega \to \mathbb{R}$ be continuously differentiable on $\Omega$. Then $f$ is convex over $\Omega \iff$, $\forall x, y \in \Omega$,
\begin{aligned} f(y) - f(x) &\geq (y - x)^\top \nabla f(x) \\ &= \nabla f(x)^\top (y-x) \end{aligned}Unconstrained Optimisation
If $f \in C^2(\mathbb{R}^n)$ then
- Local minimiser $\implies \nabla f(\mathbf{x}^*)=0$ and $\nabla^2 f(\mathbf{x}^*)$ positive semi-definite.
- Local maximiser $\implies \nabla f(\mathbf{x}^*)=0$ and $\nabla^2 f(\mathbf{x}^*)$ negative semi-definite.
If $\nabla f(\mathbf{x}^*)=0$ then
- $\nabla^2 f(\mathbf{x}^*)$ positive definite $\implies \mathbf{x}^*$ is a strict local minimiser.
- $\nabla^2 f(\mathbf{x}^*)$ negative definite $\implies \mathbf{x}^*$ is a strict local maximiser.
- $\nabla^2 f(\mathbf{x}^*)$ indefinite $\implies \mathbf{x}^*$ is a saddle point.
- $\nabla^2 f(\mathbf{x}^*)$ positive semi-definite $\implies \mathbf{x}^*$ is either a local minimiser or a saddle point!
- $\nabla^2 f(\mathbf{x}^*)$ negative semi-definite $\implies \mathbf{x}^*$ is either a local maximiser or a saddle point! Be careful with these.
From the sufficiency of stationarity as above, and under the convexity / concavity of $f \in C^2 (\mathbb{R}^n)$:
- $f$ convex $\implies$ any stationary point is a global minimiser.
- $f$ strictly convex $\implies$ stationary point is the unique global minimiser.
- $f$ concave $\implies$ any stationary point is a global maximiser.
- $f$ strictly concave $\implies$ stationary point is the unique global maximiser.
Equality Constraints
𐃏
Let $\mathbf{x}^*$ be a constrained stationary point, so there exist Lagrange multipliers $\boldsymbol{\lambda}^*$ such that
\begin{aligned} \nabla_\mathbf{x} \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*) &= \nabla f(\mathbf{x}^*) + A(\mathbf{x}^*) \boldsymbol{\lambda}^* = \mathbf{0} \\ \nabla_{\boldsymbol{\lambda}} \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*) &= \mathbf{c}(\mathbf{x}^*) = \mathbf{0} \end{aligned}If $W_Z^*$ is positive definite $\implies \mathbf{x}^*$ is a strict local minimiser.
Here, \[ A(\mathbf{x}^*) = \begin{bmatrix} \nabla c_1(\mathbf{x}^*), \dots, \nabla c_{m}(\mathbf{x}^*) \end{bmatrix}\] \[ W_Z^* := (Z^*)^\top \nabla^{2}_\mathbf{x} \mathcal{L}(\mathbf{x}^*,\boldsymbol{\lambda}^*)Z^* \] \[ Z^* \in \mathbb{R}^{n \times (n-t^*)}, \quad t^* = \mathrm{rank}(A(\mathbf{x}^*)) \] \[ (Z^*)^\top A(\mathbf{x}^*) = \mathbf{0} \]
Inequality Constraints
𐃏
Numerical Methods (unconstrained)
- $\alpha=1$ (linear),
- $\alpha=1$ with $\beta=0$ (superlinear),
- $\alpha=2$ (quadratic).
Properties:
- Descent direction: $(\mathbf{g}^{(k)})^\top \mathbf{s}^{(k)} < 0$
- Exact line search: $\nabla f(\mathbf{x}^{(k+1)})^\top \mathbf{s}^{(k)} = 0$
- Global convergence: converges to stationary point from any $\mathbf{x}^{(1)}$
- Quadratic termination: finds minimiser of strictly convex quadratic in finite iterations
Properties:
- Descent direction: Yes
- Global convergence: Yes
- Quadratic termination: No
- Rate: Linear. For strictly convex quadratic: $\|\mathbf{x}^{(k+1)} - \mathbf{x}^*\| \le \left(\frac{\kappa-1}{\kappa+1}\right)^k \|\mathbf{x}^{(1)} - \mathbf{x}^*\|$ where $\kappa$ is condition number of $\nabla^2 f$
Properties:
- Descent direction: Yes, if $G^{(k)}$ positive definite 𐃏
- Global convergence: No (Hessian may be singular)
- Quadratic termination: Yes (one iteration for strictly convex quadratics)
- Rate: Quadratic if $G^*$ positive definite
- Usage: When Hessian can be evaluated and is positive definite
Properties:
- Descent direction: Yes
- Quadratic termination: Yes with exact line searches
- Usage: Large $n$; stores only vectors, avoids solving linear systems
Penalty Methods
- $c: \mathbb{R}^n \to \mathbb{R}$ is a convex function $\implies \max\{\mathbf{c}(\mathbf{x}),0\}^2$ is a convex function
- \[\frac{\partial}{\partial x_i} [\max\{\mathbf{c}(\mathbf{x}),0\}]^2 = 2 \max\{\mathbf{c}(\mathbf{x}), 0\}\frac{\partial}{\partial x_i}\]
Optimal Control Theory
\[ H(\mathbf{x}, \mathbf{\hat{z}}, \mathbf{u}) = \mathbf{\hat{z}}^\top \dot{\mathbf{\hat{x}}} = \sum_{i = 0}^n z_i (t) f_i (\mathbf{x}(t), \mathbf{u}(t) ),\]
where $ \mathbf{\hat{z}}(t)=\begin{pmatrix} z_0(t), \dots, z_n(t) \end{pmatrix}^\top $ and $ \mathbf{\hat{x}}(t)=\begin{pmatrix} x_0(t), \dots, x_n(t) \end{pmatrix}^\top $ and $ x_0(t)=f_0(\mathbf{x}(t),\mathbf{u}(t)), \ x_0(t_0)=0 $
Suppose $(\mathbf{x}^*,\mathbf{u}^*)$ is optimal. Then
- $z_0 = -1$ (normal case), so \[ H( \mathbf{x},\mathbf{\hat{z}},\mathbf{u} ) = -f_0( \mathbf{x}(t), \mathbf{u}(t) ) + \sum_{i=1}^n z_i(t) f_i ( \mathbf{x}(t),\mathbf{u}(t) ). \]
- Co-state equations admit a solution $\hat{\mathbf{z}}^*$.
- $\mathbf{u}^*$ maximises $H(\mathbf{x}^*,\mathbf{\hat{z}}^*,\mathbf{u})$ over $u \in \mathcal{U}$.
- $\mathbf{x}^*$ satisfies state equation with $\mathbf{x}^* (t_0)=x_0$, $\mathbf{x}^* (t_1)=x_1$.
- The Hamiltonian is constant along the optimal path and equals $0$ if $t_1$ is free.
Even if the problems are not autonomous or fixed, we can still convert them into autonomous, fixed target problems:
Partially free targets: If target is intersection of $k$ surfaces $ \mathbf{g}_i(x^1)=0$, $i=1,\dots,k $, then the transversality condition is $ z_1 = \sum_{i=1}^k c_i \nabla g_{i}(\mathbf{x}^1) $ for some constants $c_i$, where $\mathbf{z}(t^1) = \mathbf{z}^1$.
Completely free target: If $x(t_1)$ is unrestricted, then $z(t_1)=0$.
Non-autonomous problems: For state $\mathbf{\dot{x}} = \mathbf{f}(\mathbf{x},\mathbf{u},\mathbf{t})$ and cost $ J=\int_{t_0}^{t_1} f_0(\mathbf{x},\mathbf{u},t) \, dt $ introduce extra state $x_{n+1}$ with \[ \dot{x}_{n+1} = 1, \ x_{n+1}(t_0)=t_0, \ x_{n+1}(t_1)=t_1 \]