Convex and Non-Convex Optimisation

Mathematical Background

Definition (Axiom)
A foundational statement accepted without proof. All other results are built ontop.

Definition (Proposition)
A proved statement that is less central than a theorem, but still of interest.

Definition (Lemma)
A "helper" proposition proved to assist in establishing a more important result.

Definition (Corollary)
A statement following from a theorem or proposition, requiring little to no extra proof.

Definition (Definition)
A precise specification of an object, concept or notation.

Definition (Theorem)
A non-trivial mathematical statement proved on the basis of axioms, definitions and earlier results.

Definition (Remark)
An explanatory or clarifying note that is not part of the formal logical chain but gives insight / context.

Definition (Claim / Conjecture)
A statement asserted that requires a proof.

Definition (Vector Norm)

A vector norm on $\mathbb{R}^n$ is a function $\|\cdot\|$ from $\mathbb{R}^n$ to $\mathbb{R}$ such that:

  • $\|\mathbf{x}\| \ge 0,\, \forall \mathbf{x} \in \mathbb{R}^n$ and $\|\mathbf{x}\| = 0 \iff \mathbf{x} = \mathbf{0}$
  • $\|\mathbf{x}+\mathbf{y}\| \le \|\mathbf{x}\|+\|\mathbf{y}\|$ $\forall \mathbf{x},\mathbf{y} \in \mathbb{R}^n$ (Triangle Inequality)
  • $\|\alpha \mathbf{x}\| = |\alpha| \|\mathbf{x}\|$ $\forall \alpha \in \mathbb{R}, \mathbf{x} \in \mathbb{R}^n$

Definition (Continuous Derivatives)
The notation $ f \in C^k ( \mathbb{R}^n) $ means that the function $f: \mathbb{R}^n \to \mathbb{R}$ possesses continuous derivatives up to order $k$ on $\mathbb{R}^n$.

Example
  • $f \in C^1 (\mathbb{R}^n)$ implies each $\frac{\partial f}{\partial x_i}$ exists, and $\nabla f(x)$ is continuous on $\mathbb{R}^n$
  • $f \in C^2 (\mathbb{R}^n)$ implies each $\frac{\partial^2 f}{\partial x_i \partial x_j}$ exists, and $\nabla^2 f(x)$ forms a continuous Hessian matrix.

Theorem (Cauchy Shwarz-Inequality)
\[\lvert\mathbf{x}^T \mathbf{y}\rvert \leq \|\mathbf{x}\|_{2} \|\mathbf{y}\|_{2}\]

Definition (Closed and Bounded Sets)
A set $\Omega \subset \mathbb{R}^n$ is closed if it contains all the limits of convergent sequences of points in $\Omega$. A set $\Omega \subset \mathbb{R}^n$ is bounded if $\exists K \in \mathbb{R}^+$ for which $\Omega \subset B[0,K]$, where $B[0,K] = \{\mathbf{x} \in \mathbb{R}^n : \|\mathbf{x}\| \leq K\}$ is the ball with centre 0.

Definition (Standard Vector Function Forms)

If $f_0 \in \mathbb{R}, \mathbf{g} \in \mathbb{R}^n, G \in \mathbb{R}^{n \times n}$:

  • Linear: $f(\mathbf{x}) = \mathbf{g}^T \mathbf{x}$
  • Affine: $f(\mathbf{x}) = \mathbf{g}^T \mathbf{x} + f_0$
  • Quadratic: \[f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T G \mathbf{x} + \mathbf{g}^T \mathbf{x} + f_0\]

Definition (Symmetric)

Let $A \in \mathbb{R}^{n \times n}$ be a symmetric matrix. Then:

  • $A$ has $n$ real eigenvalues.
  • There exists an orthogonal matrix $Q$ such that $A= Q D Q^\top$ where $D=\mathrm{diag}(\lambda_1,\dots,\lambda_n)$ and $Q=[v_1 \, \dots \, v_n]$ with $v_i$ an eigenvector of $A$ corresponding to eigenvalue $\lambda_i$.
  • $\mathrm{det}(A)=\prod_{i=1}^n \lambda_i$ and \[\mathrm{tr}(A)=\sum_{i=1}^n \lambda_i=\sum_{i=1}^n A_{ii}\]
  • $A$ is positive definite $\iff \lambda_i>0$ for all $i=1,\dots,n$.
  • $A$ is positive semi-definite $\iff \lambda_i \geq 0$ for all $i=1,\dots,n$.
  • $A$ is indefinite $\iff$ there exist $i,j$ with $\lambda_i>0$ and $\lambda_j<0$.
𐃏

Definition (Leading Principal Minors / Sylvester's Criterion)

A symmetric matrix $A$ is positive definite if and only if all leading principal minors of A are positive. The $i$th principal minor $\Delta_i$ of A is the determinant of the leading $i \times i$ submatrix of A.

If $\Delta_i, i = 1, 2, \dots, n$ has the sign of $(-1)^i, i = 1, 2, \dots, n$, that is, the values of $\Delta_i$ are alternatively negative and positive, then $A$ is negative definite.

Note that PSD only applies if you check all principal minors, of which there are $2^n - 1$, as opposed to just checking $n$ submatrices here.

Convexity

Definition (Convex)
A set $\Omega \subseteq \mathbb{R}^n$ is convex $\iff \theta \mathbf{x} + (1-\theta) \mathbf{y} \in \Omega$ for all $\theta \in[0,1]$ and for all $\mathbf{x},\mathbf{y} \in \Omega$.
𐃏

Proposition (Intersection of Convex Sets)
Let $\Omega_1, \dots, \Omega_n \subseteq \mathbb{R}^n$ be convex, then their intersections $\Omega = \Omega_1 \cap \dots \cap \Omega_n $ is convex.

Definition (Extreme Points)

Let $\Omega \subseteq \mathbb{R}^n$ be a convex set. $\overline{\mathbf{x}} \in \Omega$ is an extreme point of \[ \Omega \iff \mathbf{x}, \mathbf{y} \in \Omega, \, \theta \in [0,1] \] and \[ \overline{\mathbf{x}} = \theta \mathbf{x} + (1 - \theta) \mathbf{y} \implies \mathbf{x} = \mathbf{y} \].

In other words, a point is in an extreme point if it cannot be on a line segment in $\Omega$.

/wiki/mathematics/optimisation/
extreme.svg
Extreme points

Definition (Convex Combination)
The convex combination of $\mathbf{x}^{(1)}, \dots, \mathbf{x}^{(m)} \in \mathbb{R}^m$ is \[ \mathbf{x} = \sum_{i=1}^m \alpha_i \mathbf{x}^{(i)}, \text{ where } \sum_{i=1}^m \alpha_i = 1 \text{ and } \alpha_i \geq 0, i= 1, \dots, m \]

Definition (Convex Hull)
The convex hull $\mathrm{conv}(\Omega)$ of a set $\Omega$ is the set of all convex combinations of points in $\Omega$.

Theorem (Separating Hyperplane)
Let $\Omega \subseteq \mathbb{R}^n$ be a non-empty closed convex set and let $z \notin \Omega$. There exists a hyperplane $H = \{\mathbf{u} \in \mathbb{R}^n: \mathbf{a}^T \mathbf{u} = \beta\}$ such that $\mathbf{a}^T \mathbf{z} < \beta$ and $\mathbf{a}^T \mathbf{x} \geq \beta$ for all $x \in \Omega$.

Definition (Convex Function)

A function $f: \Omega \to \mathbb{R}$ is:

  • convex if $f(\theta \mathbf{x} + (1-\theta)\mathbf{y}) \leq \theta f(\mathbf{x}) + (1-\theta)f(\mathbf{y})$ for all $\mathbf{x},\mathbf{y} \in \Omega$ and $\theta \in [0,1]$;
  • strictly convex if strict inequality holds whenever $\mathbf{x} \neq \mathbf{y}$;
  • concave if $-f$ is convex.

Proposition (Continous Differentiability of Convex Fn)

Let $\Omega$ be a convex set and let $f: \Omega \to \mathbb{R}$ be continuously differentiable on $\Omega$. Then $f$ is convex over $\Omega \iff$, $\forall x, y \in \Omega$,

\begin{aligned} f(y) - f(x) &\geq (y - x)^\top \nabla f(x) \\ &= \nabla f(x)^\top (y-x) \end{aligned}

Unconstrained Optimisation

Definition (Standard Form)
\[ \operatorname*{minimise}_{\mathbf{x} \in \Omega} f(\mathbf{x}) \]

Remark
$ \max f(\mathbf{x}) = -\min\{-f(\mathbf{x})\} $

Definition (Hessian)
Let $f: \mathbb{R}^n \to \mathbb{R}$ be twice continuously differentiable. The Hessian $\nabla^2 f : \mathbb{R}^n \to \mathbb{R}^{n \times n}$ of $f$ at $x$ is \[ \nabla^2 f(x)=\begin{pmatrix} \frac{\partial^2 f(x)}{\partial x_1^2} & \frac{\partial^2 f(x)}{\partial x_1 \partial x_2} & \dots & \frac{\partial^2 f(x)}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f(x)}{\partial x_2 \partial x_1} & \frac{\partial^2 f(x)}{\partial x_2^2} & \dots & \frac{\partial^2 f(x)}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f(x)}{\partial x_n \partial x_1} & \frac{\partial^2 f(x)}{\partial x_n \partial x_2} & \dots & \frac{\partial^2 f(x)}{\partial x_n^2} \end{pmatrix} \]

Theorem (First order necessary conditions)
If $x^*$ is a local minimizer and $f \in C^1(\mathbb{R}^n)$ then $\nabla f(x^*) = 0$.

Definition ((Unconstrained) Stationary point)
$x^*$ is an unconstrained stationary point $\iff \nabla f(x^*)=0$

Example
local min, local max, saddle point.

Definition (Saddle point)
A stationary point $\mathbf{x}^* \in \mathbb{R}^n$ is a saddle point of $f$ if for any $\delta > 0$ there exist $\mathbf{x}, \mathbf{y}$ with $\|\mathbf{x}-\mathbf{x}^*\| < \delta$, $\|\mathbf{y}-\mathbf{x}^*\| < \delta$ such that: \[ f(\mathbf{x}) < f(\mathbf{x^*}) \text{ and } f(\mathbf{y}) > f(\mathbf{x^*})\]

Proposition (Second order necessary conditions)

If $f \in C^2(\mathbb{R}^n)$ then

  • Local minimiser $\implies \nabla f(\mathbf{x}^*)=0$ and $\nabla^2 f(\mathbf{x}^*)$ positive semi-definite.
  • Local maximiser $\implies \nabla f(\mathbf{x}^*)=0$ and $\nabla^2 f(\mathbf{x}^*)$ negative semi-definite.

Corollary (Local maximiser)
$\overline{\mathbf{x}}$ is a local maximiser $\implies \nabla f(\overline{\mathbf{x}}) = \mathbf{0}$ and $\nabla^2 f(\mathbf{\overline{x}})$ negative semi-definite.

Theorem (Second order sufficient conditions)

If $\nabla f(\mathbf{x}^*)=0$ then

  • $\nabla^2 f(\mathbf{x}^*)$ positive definite $\implies \mathbf{x}^*$ is a strict local minimiser.
  • $\nabla^2 f(\mathbf{x}^*)$ negative definite $\implies \mathbf{x}^*$ is a strict local maximiser.
  • $\nabla^2 f(\mathbf{x}^*)$ indefinite $\implies \mathbf{x}^*$ is a saddle point.
  • $\nabla^2 f(\mathbf{x}^*)$ positive semi-definite $\implies \mathbf{x}^*$ is either a local minimiser or a saddle point!
  • $\nabla^2 f(\mathbf{x}^*)$ negative semi-definite $\implies \mathbf{x}^*$ is either a local maximiser or a saddle point! Be careful with these.

Corollary (Global Optimas)

From the sufficiency of stationarity as above, and under the convexity / concavity of $f \in C^2 (\mathbb{R}^n)$:

  • $f$ convex $\implies$ any stationary point is a global minimiser.
  • $f$ strictly convex $\implies$ stationary point is the unique global minimiser.
  • $f$ concave $\implies$ any stationary point is a global maximiser.
  • $f$ strictly concave $\implies$ stationary point is the unique global maximiser.

Equality Constraints

Definition (Standard Form)
\begin{aligned} \operatorname*{minimise}_{\mathbf{x} \in \Omega} \;&f(\mathbf{x}) \\ \text{subject to } &\mathbf{c}_i (\mathbf{x}) = 0 \end{aligned}

Definition (Lagrangian)
For $\mathbf{x} \in \mathbb{R}^n$, $\boldsymbol{\lambda} \in \mathbb{R}^m$, \[ \mathcal{L}(\mathbf{x},\boldsymbol{\lambda}) := f(\mathbf{x}) + \sum_{i=1}^m \lambda_i c_{i}(\mathbf{x}) \]

𐃏

Definition (Regular Point)
A feasible point $\overline{\mathbf{x}}$ is regular $\iff$ the gradients $\nabla c_{i}(\overline{\mathbf{x}})$, $i=1,\dots,m$, are linearly independent.
𐃏

Definition (Matrix of Constraint Gradients)
\[ A(\mathbf{x}) = \begin{bmatrix} \nabla \mathbf{c}_1 (\mathbf{x}), \dots ,\nabla \mathbf{c}_m (\mathbf{x}) \end{bmatrix} \]

Definition (Jacobian)
\begin{aligned} J(\mathbf{x}) &= A(\mathbf{x})^T \\ &= \begin{pmatrix} \nabla \mathbf{c}_1 (\mathbf{x})^T \\ \vdots \\ \nabla \mathbf{c}_m (\mathbf{x})^T \end{pmatrix} \end{aligned}

Proposition (First order necessary optimality conditions)
If $\mathbf{x}^*$ is a local minimiser and a regular point, then $\exists \, \boldsymbol{\lambda}^* \in \mathbb{R}^m$ such that \[ \nabla_\mathbf{x} \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*) = \mathbf{0}, \quad \nabla_{\boldsymbol{\lambda}} \mathcal{L}(\mathbf{x}^* , \boldsymbol{\lambda}^*) = \mathbf{0} \]

Corollary (Constrained Stationary Point)
Any $\mathbf{x}^*$ for which $\exists \, \boldsymbol{\lambda}^*$ satisfying the first order conditions is called a constrained stationary point.

Proposition (Second order sufficient conditions)

Let $\mathbf{x}^*$ be a constrained stationary point, so there exist Lagrange multipliers $\boldsymbol{\lambda}^*$ such that

\begin{aligned} \nabla_\mathbf{x} \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*) &= \nabla f(\mathbf{x}^*) + A(\mathbf{x}^*) \boldsymbol{\lambda}^* = \mathbf{0} \\ \nabla_{\boldsymbol{\lambda}} \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*) &= \mathbf{c}(\mathbf{x}^*) = \mathbf{0} \end{aligned}

If $W_Z^*$ is positive definite $\implies \mathbf{x}^*$ is a strict local minimiser.

Here, \[ A(\mathbf{x}^*) = \begin{bmatrix} \nabla c_1(\mathbf{x}^*), \dots, \nabla c_{m}(\mathbf{x}^*) \end{bmatrix}\] \[ W_Z^* := (Z^*)^\top \nabla^{2}_\mathbf{x} \mathcal{L}(\mathbf{x}^*,\boldsymbol{\lambda}^*)Z^* \] \[ Z^* \in \mathbb{R}^{n \times (n-t^*)}, \quad t^* = \mathrm{rank}(A(\mathbf{x}^*)) \] \[ (Z^*)^\top A(\mathbf{x}^*) = \mathbf{0} \]

Remark
where $W_Z^*$ is the reduced Hessian of the Lagrangian, and that in turn can be thought of as the projection of the Lagrangian's Hessian onto the tangent space of the constraints at the point $\mathbf{x}^*$

Inequality Constraints

Definition (Standard Form)
\begin{aligned} \operatorname*{minimise}_{\mathbf{x} \in \Omega} &f(\mathbf{x}) \\ \text{subject to} &\mathbf{c}_i (\mathbf{x}) = 0, \quad i = 1, \dots, m_E \\ &\mathbf{c}_i (\mathbf{x}) \leq 0, \quad i = m_E + 1, \dots, m \end{aligned}

Definition (Convex Problem)
The problem is a standard form convex optimisation problem if the objective function $f$ is convex on the feasible set, $\mathbf{c}_i$ is affine for each $i \in \mathcal{E}$, and $\mathbf{c}_i$ is convex for each $i \in \mathcal{I}$.

Definition (Active Set)
The set of active constraints at a feasible point $\mathbf{x}$ is \[ \mathcal{A}(\mathbf{x}) = \{ i \in \{1, \dots, m\} : \mathbf{c}_i (\mathbf{x}) = 0\} \]

𐃏

Definition (Regular Point)
Feasible $\mathbf{x}^*$ such that $\{\nabla c_i(\mathbf{x}^*) : i \in \mathcal{A}(\mathbf{x}^*)\}$ are linearly independent.

Proposition (Constrained Stationary Point)
Feasible $\mathbf{x}^*$ for which $\exists \lambda_i^*$ for $i \in \mathcal{A}(\mathbf{x}^*)$ with \[ \nabla f(\mathbf{x}^*) + \sum_{i \in \mathcal{A}(\mathbf{x}^*)} \lambda_i^* \nabla c_{i}(\mathbf{x}^*) = \mathbf{0} \]

Theorem (Karush Kuhn Tucker (KKT) necessary optimality conditions)
If $\mathbf{x}^*$ is a local minimiser and a regular point, then $\exists \lambda_i^*$ ($i \in \mathcal{A}(\mathbf{x}^*)$) such that \[ \nabla f(\mathbf{x}^*) + \sum_{i \in \mathcal{A}(\mathbf{x}^*)} \lambda_i^* \nabla c_{i}(\mathbf{x}^*) = \mathbf{0}, \] with $c_{i}(\mathbf{x}^*)=0$ $(i \in \mathcal{E})$, $c_{i}(\mathbf{x}^*) \leq 0$ $(i \in \mathcal{I})$, $ \lambda_i^* \geq 0$ $(i \in \mathcal{I})$, and $\lambda_i^*=0$ for $i \notin \mathcal{A}(\mathbf{x}^*)$.

Remark
KKT generalises Lagrange Multipliers from just equality constraints, to both equality and inequality constraints.

Theorem (Second-order sufficient conditions for strict local minimum)
Let $ t^*=|\mathcal{A}(\mathbf{x}^*)| $ \[ \mathcal{A}^*=[\nabla c_{i}(\mathbf{x}^*) \mid i \in \mathcal{A}(\mathbf{x}^*)] \] If $t^* < n$ and $\mathcal{A}^*$ has full rank, let $ Z^* \in \mathbb{R}^{n \times (n-t^*)} $ with $ (Z^*)^\top \mathcal{A}^*=0 $ Define \[ W^* = \nabla^2 f(\mathbf{x}^*) + \sum_{i \in \mathcal{A}(\mathbf{x}^*)} \lambda_i^* \nabla^2 c_{i}(\mathbf{x}^*) \] $ W_Z^* = (Z^*)^\top W^* Z^* $ If \[ \lambda_i^* > 0, \forall i \in \mathcal{I} \cap \mathcal{A}(\mathbf{x}^*) \text{ and } W_Z^* \succeq 0 \] then $\mathbf{x}^*$ is a strict local minimiser.

Theorem (KKT sufficient conditions for global minimum)
If the problem is convex and $\mathbf{x}^*$ satisfies the KKT conditions with $\lambda_i^* \geq 0$ for all $i \in \mathcal{I} \cap \mathcal{A}(\mathbf{x}^*)$, then $\mathbf{x}^*$ is a global minimiser.

Definition (Wolfe Dual Problem)
\begin{aligned} \max_{\mathbf{y} \in \mathbb{R}^n, \, \boldsymbol{\lambda} \in \mathbb{R}^m} &f(\mathbf{y}) + \sum_{i=1}^m \lambda_i c_{i}(\mathbf{y}) \\ \text{s.t.} &\nabla f(\mathbf{y}) + \sum_{i=1}^m \lambda_i \nabla c_i(\mathbf{y}) = 0 \\ &\lambda_i > 0 \quad (i \in \mathcal{I}) \end{aligned}

Proposition (Weak Duality)
If $\mathbf{x}$ is primal feasible and $(\mathbf{y}, \boldsymbol{\lambda})$ is dual feasible, then: \[f(\mathbf{x}) \geq f(\mathbf{y}) + \sum_{i=1}^m \lambda_i c_{i}(\mathbf{y})\]

Theorem (Strong Duality)
Under suitable constraint qualifications (e.g., Slater's condition), there exist primal-dual feasible points $\mathbf{x}^*$ and $(\mathbf{y}^*, \boldsymbol{\lambda}^*)$ such that: \[ f(\mathbf{x}^*) = f(\mathbf{y}^*) + \sum_{i=1}^m \lambda_i^* c_{i}(\mathbf{y}^*) \]

Numerical Methods (unconstrained)

Definition (Rates of convergence of iterative methods)
If $\mathbf{x}_k \to \mathbf{x}^*$ and \[ \frac{\|\mathbf{x}_{k+1}-\mathbf{x}^*\|}{\|\mathbf{x}_k- \mathbf{x}^*\|^\alpha} \to \beta \]
as $k \to \infty$, the method has $\alpha$-th order convergence. Key cases:

  • $\alpha=1$ (linear),
  • $\alpha=1$ with $\beta=0$ (superlinear),
  • $\alpha=2$ (quadratic).

Properties:

  • Descent direction: $(\mathbf{g}^{(k)})^\top \mathbf{s}^{(k)} < 0$
  • Exact line search: $\nabla f(\mathbf{x}^{(k+1)})^\top \mathbf{s}^{(k)} = 0$
  • Global convergence: converges to stationary point from any $\mathbf{x}^{(1)}$
  • Quadratic termination: finds minimiser of strictly convex quadratic in finite iterations

Properties:

  • Descent direction: Yes
  • Global convergence: Yes
  • Quadratic termination: No
  • Rate: Linear. For strictly convex quadratic: $\|\mathbf{x}^{(k+1)} - \mathbf{x}^*\| \le \left(\frac{\kappa-1}{\kappa+1}\right)^k \|\mathbf{x}^{(1)} - \mathbf{x}^*\|$ where $\kappa$ is condition number of $\nabla^2 f$

Properties:

  • Descent direction: Yes, if $G^{(k)}$ positive definite 𐃏
  • Global convergence: No (Hessian may be singular)
  • Quadratic termination: Yes (one iteration for strictly convex quadratics)
  • Rate: Quadratic if $G^*$ positive definite
  • Usage: When Hessian can be evaluated and is positive definite

Properties:

  • Descent direction: Yes
  • Quadratic termination: Yes with exact line searches
  • Usage: Large $n$; stores only vectors, avoids solving linear systems

Penalty Methods

Definition (Penalty function)
\[ \min_{\mathbf{x} \in \mathbb{R}^n} ( f(\mathbf{x}) + \mu P(\mathbf{x}) ) \] where \[ P(\mathbf{x}) = \sum_{i=1}^{m_E} [c_i(\mathbf{x})]^2 + \sum_{i=m_E + 1}^{m} [c_i(\mathbf{x})]^{2}_+ \] and $ [c_i(\mathbf{x})]_+ = \max\{c_i(\mathbf{x}),0\} $

Remark
  • $c: \mathbb{R}^n \to \mathbb{R}$ is a convex function $\implies \max\{\mathbf{c}(\mathbf{x}),0\}^2$ is a convex function
  • \[\frac{\partial}{\partial x_i} [\max\{\mathbf{c}(\mathbf{x}),0\}]^2 = 2 \max\{\mathbf{c}(\mathbf{x}), 0\}\frac{\partial}{\partial x_i}\]

Theorem (Convergence Theorem)
For each $\mu>0$ let $\mathbf{x}_{\mu}$ minimise the penalty problem and set $\theta(\mu)=f(\mathbf{x}_{\mu})+\mu P(\mathbf{x}_{\mu})$. Suppose $\{\mathbf{x}_{\mu}\}$ lies in a closed bounded set. Then \[ \min_{\mathbf{x}} \{ f(\mathbf{x}) : c_{i}(\mathbf{x})=0, \, i \in \mathcal{E}, \; c_{i}(\mathbf{x}) \leq 0, i \in \mathcal{I} \} = \lim_{\mu \to \infty} \theta(\mu). \] Moreover, any cluster point $\mathbf{x}^*$ of $\{\mathbf{x}_{\mu}\}$ solves the original problem, and $\mu P(\mathbf{x}_{\mu}) \to 0$ as $\mu \to \infty$.

Optimal Control Theory

Definition (Standard Form)
\begin{aligned} \min_{\mathbf{u}(t) \in U} \int_{t_0}^{t_1} &f_0 (\mathbf{x}(t), \mathbf{u}(t) ) \, dt \\ \text{subject to} \quad \dot{\mathbf{x}}(t) &= \mathbf{f} (\mathbf{x}(t), \mathbf{u}(t) ) \\ \mathbf{x}(t_0) &= \mathbf{x}_0 \\ \mathbf{x}(t_1) &= \mathbf{x}_1 . \end{aligned}

Definition (Hamiltonian)

\[ H(\mathbf{x}, \mathbf{\hat{z}}, \mathbf{u}) = \mathbf{\hat{z}}^\top \dot{\mathbf{\hat{x}}} = \sum_{i = 0}^n z_i (t) f_i (\mathbf{x}(t), \mathbf{u}(t) ),\]

where $ \mathbf{\hat{z}}(t)=\begin{pmatrix} z_0(t), \dots, z_n(t) \end{pmatrix}^\top $ and $ \mathbf{\hat{x}}(t)=\begin{pmatrix} x_0(t), \dots, x_n(t) \end{pmatrix}^\top $ and $ x_0(t)=f_0(\mathbf{x}(t),\mathbf{u}(t)), \ x_0(t_0)=0 $

Definition (Co-state Equations)
$ \dot{\hat{z}} = -\frac{\partial H}{\partial \hat{x}} $

Theorem (Autonomous, fixed targets (Pontryagin Maximum Principle))

Suppose $(\mathbf{x}^*,\mathbf{u}^*)$ is optimal. Then

  • $z_0 = -1$ (normal case), so \[ H( \mathbf{x},\mathbf{\hat{z}},\mathbf{u} ) = -f_0( \mathbf{x}(t), \mathbf{u}(t) ) + \sum_{i=1}^n z_i(t) f_i ( \mathbf{x}(t),\mathbf{u}(t) ). \]
  • Co-state equations admit a solution $\hat{\mathbf{z}}^*$.
  • $\mathbf{u}^*$ maximises $H(\mathbf{x}^*,\mathbf{\hat{z}}^*,\mathbf{u})$ over $u \in \mathcal{U}$.
  • $\mathbf{x}^*$ satisfies state equation with $\mathbf{x}^* (t_0)=x_0$, $\mathbf{x}^* (t_1)=x_1$.
  • The Hamiltonian is constant along the optimal path and equals $0$ if $t_1$ is free.
Remark

Even if the problems are not autonomous or fixed, we can still convert them into autonomous, fixed target problems:

Partially free targets: If target is intersection of $k$ surfaces $ \mathbf{g}_i(x^1)=0$, $i=1,\dots,k $, then the transversality condition is $ z_1 = \sum_{i=1}^k c_i \nabla g_{i}(\mathbf{x}^1) $ for some constants $c_i$, where $\mathbf{z}(t^1) = \mathbf{z}^1$.

Completely free target: If $x(t_1)$ is unrestricted, then $z(t_1)=0$.

Non-autonomous problems: For state $\mathbf{\dot{x}} = \mathbf{f}(\mathbf{x},\mathbf{u},\mathbf{t})$ and cost $ J=\int_{t_0}^{t_1} f_0(\mathbf{x},\mathbf{u},t) \, dt $ introduce extra state $x_{n+1}$ with \[ \dot{x}_{n+1} = 1, \ x_{n+1}(t_0)=t_0, \ x_{n+1}(t_1)=t_1 \]