Convex and Non-Convex Optimisation
Mathematical Background
A foundational statement accepted without proof. All other results are built ontop.
A proved statement that is less central than a theorem, but still of interest.
A “helper” proposition proved to assist in establishing a more important result.
A statement following from a theorem or proposition, requiring little to no extra proof.
A precise specification of an object, concept or notation.
A non-trivial mathematical statement proved on the basis of axioms, definitions and earlier results.
An explanatory or clarifying note that is not part of the formal logical chain but gives insight / context.
A statement asserted that requires a proof.
A vector norm on \(\mathbb{R}^n\) is a function \(\|\cdot\|\) from \(\mathbb{R}^n\) to \(\mathbb{R}\) such that:
- \(\|\mathbf{x}\| \ge 0,\, \forall \mathbf{x} \in \mathbb{R}^n\) and \(\|\mathbf{x}\| = 0 \iff \mathbf{x} = \mathbf{0}\)
- \(\|\mathbf{x}+\mathbf{y}\| \le \|\mathbf{x}\|+\|\mathbf{y}\|\) \(\forall \mathbf{x},\mathbf{y} \in \mathbb{R}^n\) (Triangle Inequality)
- \(\|\alpha \mathbf{x}\| = |\alpha| \|\mathbf{x}\|\) \(\forall \alpha \in \mathbb{R}, \mathbf{x} \in \mathbb{R}^n\)
The notation $ f ∈ C^k ( \mathbb{R}^n) $ means that the function \(f: \mathbb{R}^n \to \mathbb{R}\) possesses continuous derivatives up to order \(k\) on \(\mathbb{R}^n\).
- \(f \in C^1 (\mathbb{R}^n)\) implies each \(\frac{\partial f}{\partial x_i}\) exists, and \(\nabla f(x)\) is continuous on \(\mathbb{R}^n\)
- \(f \in C^2 (\mathbb{R}^n)\) implies each \(\frac{\partial^2 f}{\partial x_i \partial x_j}\) exists, and \(\nabla^2 f(x)\) forms a continuous Hessian matrix.
\[\lvert\mathbf{x}^T \mathbf{y}\rvert \leq \|\mathbf{x}\|_{2} \|\mathbf{y}\|_{2}\]
A set \(\Omega \subset \mathbb{R}^n\) is closed if it contains all the limits of convergent sequences of points in \(\Omega\). A set \(\Omega \subset \mathbb{R}^n\) is bounded if \(\exists K \in \mathbb{R}^+\) for which \(\Omega \subset B[0,K]\), where \(B[0,K] = \{\mathbf{x} \in \mathbb{R}^n : \|\mathbf{x}\| \leq K\}\) is the ball with centre 0.
If \(f_0 \in \mathbb{R}, \mathbf{g} \in \mathbb{R}^n, G \in \mathbb{R}^{n \times n}\):
- Linear: \(f(\mathbf{x}) = \mathbf{g}^T \mathbf{x}\)
- Affine: \(f(\mathbf{x}) = \mathbf{g}^T \mathbf{x} + f_0\)
- Quadratic: \[f(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T G \mathbf{x} + \mathbf{g}^T \mathbf{x} + f_0\]
Let \(A \in \mathbb{R}^{n \times n}\) be a symmetric matrix. Then:
- \(A\) has \(n\) real eigenvalues.
- There exists an orthogonal matrix \(Q\) such that \(A= Q D Q^\top\) where \(D=\mathrm{diag}(\lambda_1,\dots,\lambda_n)\) and \(Q=[v_1 \, \dots \, v_n]\) with \(v_i\) an eigenvector of \(A\) corresponding to eigenvalue \(\lambda_i\).
- \(\mathrm{det}(A)=\prod_{i=1}^n \lambda_i\) and \[\mathrm{tr}(A)=\sum_{i=1}^n \lambda_i=\sum_{i=1}^n A_{ii}\]
- \(A\) is positive definite \(\iff \lambda_i>0\) for all \(i=1,\dots,n\).
- \(A\) is positive semi-definite \(\iff \lambda_i \geq 0\) for all \(i=1,\dots,n\).
- \(A\) is indefinite \(\iff\) there exist \(i,j\) with \(\lambda_i>0\) and \(\lambda_j<0\).
A symmetric matrix \(A\) is positive definite if and only if all leading principal minors of A are positive. The $i$th principal minor \(\Delta_i\) of A is the determinant of the leading \(i \times i\) submatrix of A.
If \(\Delta_i, i = 1, 2, \dots, n\) has the sign of \((-1)^i, i = 1, 2, \dots, n\), that is, the values of \(\Delta_i\) are alternatively negative and positive, then \(A\) is negative definite.
Note that PSD only applies if you check all principal minors, of which there are \(2^n - 1\), as opposed to just checking \(n\) submatrices here.
Convexity
A set \(\Omega \subseteq \mathbb{R}^n\) is convex \(\iff \theta \mathbf{x} + (1-\theta) \mathbf{y} \in \Omega\) for all \(\theta \in[0,1]\) and for all \(\mathbf{x},\mathbf{y} \in \Omega\).
Let \(\Omega_1, \dots, \Omega_n \subseteq \mathbb{R}^n\) be convex, then their intersections $Ω = Ω_1 ∩ … ∩ Ω_n $ is convex.
Let \(\Omega \subseteq \mathbb{R}^n\) be a convex set. \(\overline{\mathbf{x}} \in \Omega\) is an extreme point of \[ \Omega \iff \mathbf{x}, \mathbf{y} \in \Omega, \, \theta \in [0,1] \] and \[ \overline{\mathbf{x}} = \theta \mathbf{x} + (1 - \theta) \mathbf{y} \implies \mathbf{x} = \mathbf{y} \].
In other words, a point is in an extreme point if it cannot be on a line segment in \(\Omega\).
Extreme points
The convex combination of \(\mathbf{x}^{(1)}, \dots, \mathbf{x}^{(m)} \in \mathbb{R}^m\) is \[ \mathbf{x} = \sum_{i=1}^m \alpha_i \mathbf{x}^{(i)}, \text{ where } \sum_{i=1}^m \alpha_i = 1 \text{ and } \alpha_i \geq 0, i= 1, \dots, m \]
The convex hull \(\mathrm{conv}(\Omega)\) of a set \(\Omega\) is the set of all convex combinations of points in \(\Omega\).
Let \(\Omega \subseteq \mathbb{R}^n\) be a non-empty closed convex set and let \(z \notin \Omega\). There exists a hyperplane \(H = \{\mathbf{u} \in \mathbb{R}^n: \mathbf{a}^T \mathbf{u} = \beta\}\) such that \(\mathbf{a}^T \mathbf{z} < \beta\) and \(\mathbf{a}^T \mathbf{x} \geq \beta\) for all \(x \in \Omega\).
A function \(f: \Omega \to \mathbb{R}\) is:
- convex if \(f(\theta \mathbf{x} + (1-\theta)\mathbf{y}) \leq \theta f(\mathbf{x}) + (1-\theta)f(\mathbf{y})\) for all \(\mathbf{x},\mathbf{y} \in \Omega\) and \(\theta \in [0,1]\);
- strictly convex if strict inequality holds whenever \(\mathbf{x} \neq \mathbf{y}\);
- concave if \(-f\) is convex.
Let \(\Omega\) be a convex set and let \(f: \Omega \to \mathbb{R}\) be continuously differentiable on \(\Omega\). Then \(f\) is convex over \(\Omega \iff\), \(\forall x, y \in \Omega\),
\begin{aligned} f(y) - f(x) &\geq (y - x)^\top \nabla f(x) \\ &= \nabla f(x)^\top (y-x) \end{aligned}
Unconstrained Optimisation
\[ \operatorname*{minimise}_{\mathbf{x} \in \Omega} f(\mathbf{x}) \]
$ max f(\mathbf{x}) = -min\{-f(\mathbf{x})\} $
Let \(f: \mathbb{R}^n \to \mathbb{R}\) be twice continuously differentiable. The Hessian \(\nabla^2 f : \mathbb{R}^n \to \mathbb{R}^{n \times n}\) of \(f\) at \(x\) is \[ \nabla^2 f(x)=\begin{pmatrix} \frac{\partial^2 f(x)}{\partial x_1^2} & \frac{\partial^2 f(x)}{\partial x_1 \partial x_2} & \dots & \frac{\partial^2 f(x)}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f(x)}{\partial x_2 \partial x_1} & \frac{\partial^2 f(x)}{\partial x_2^2} & \dots & \frac{\partial^2 f(x)}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f(x)}{\partial x_n \partial x_1} & \frac{\partial^2 f(x)}{\partial x_n \partial x_2} & \dots & \frac{\partial^2 f(x)}{\partial x_n^2} \end{pmatrix} \]
If \(x^*\) is a local minimizer and \(f \in C^1(\mathbb{R}^n)\) then \(\nabla f(x^*) = 0\).
\(x^*\) is an unconstrained stationary point \(\iff \nabla f(x^*)=0\)
local min, local max, saddle point.
A stationary point \(\mathbf{x}^* \in \mathbb{R}^n\) is a saddle point of \(f\) if for any \(\delta > 0\) there exist \(\mathbf{x}, \mathbf{y}\) with \(\|\mathbf{x}-\mathbf{x}^*\| < \delta\), \(\|\mathbf{y}-\mathbf{x}^*\| < \delta\) such that: \[ f(\mathbf{x}) < f(\mathbf{x^*}) \text{ and } f(\mathbf{y}) > f(\mathbf{x^*})\]
If \(f \in C^2(\mathbb{R}^n)\) then
- Local minimiser \(\implies \nabla f(\mathbf{x}^*)=0\) and \(\nabla^2 f(\mathbf{x}^*)\) positive semi-definite.
- Local maximiser \(\implies \nabla f(\mathbf{x}^*)=0\) and \(\nabla^2 f(\mathbf{x}^*)\) negative semi-definite.
\(\overline{\mathbf{x}}\) is a local maximiser \(\implies \nabla f(\overline{\mathbf{x}}) = \mathbf{0}\) and \(\nabla^2 f(\mathbf{\overline{x}})\) negative semi-definite.
If \(\nabla f(\mathbf{x}^*)=0\) then
- \(\nabla^2 f(\mathbf{x}^*)\) positive definite \(\implies \mathbf{x}^*\) is a strict local minimiser.
- \(\nabla^2 f(\mathbf{x}^*)\) negative definite \(\implies \mathbf{x}^*\) is a strict local maximiser.
- \(\nabla^2 f(\mathbf{x}^*)\) indefinite \(\implies \mathbf{x}^*\) is a saddle point.
- \(\nabla^2 f(\mathbf{x}^*)\) positive semi-definite \(\implies \mathbf{x}^*\) is either a local minimiser or a saddle point!
- \(\nabla^2 f(\mathbf{x}^*)\) negative semi-definite \(\implies \mathbf{x}^*\) is either a local maximiser or a saddle point! Be careful with these.
From the sufficiency of stationarity as above, and under the convexity / concavity of \(f \in C^2 (\mathbb{R}^n)\):
- \(f\) convex \(\implies\) any stationary point is a global minimiser.
- \(f\) strictly convex \(\implies\) stationary point is the unique global minimiser.
- \(f\) concave \(\implies\) any stationary point is a global maximiser.
- \(f\) strictly concave \(\implies\) stationary point is the unique global maximiser.
Equality Constraints
\begin{aligned} \operatorname*{minimise}_{\mathbf{x} \in \Omega} \;&f(\mathbf{x}) \\ \text{subject to } &\mathbf{c}_i (\mathbf{x}) = 0 \end{aligned}
For \(\mathbf{x} \in \mathbb{R}^n\), \(\boldsymbol{\lambda} \in \mathbb{R}^m\), \[ \mathcal{L}(\mathbf{x},\boldsymbol{\lambda}) := f(\mathbf{x}) + \sum_{i=1}^m \lambda_i c_{i}(\mathbf{x}) \]
A feasible point \(\overline{\mathbf{x}}\) is regular \(\iff\) the gradients \(\nabla c_{i}(\overline{\mathbf{x}})\), \(i=1,\dots,m\), are linearly independent.
\[ A(\mathbf{x}) = \begin{bmatrix} \nabla \mathbf{c}_1 (\mathbf{x}), \dots ,\nabla \mathbf{c}_m (\mathbf{x}) \end{bmatrix} \]
\begin{aligned} J(\mathbf{x}) &= A(\mathbf{x})^T \\ &= \begin{pmatrix} \nabla \mathbf{c}_1 (\mathbf{x})^T \\ \vdots \\ \nabla \mathbf{c}_m (\mathbf{x})^T \end{pmatrix} \end{aligned}
If \(\mathbf{x}^*\) is a local minimiser and a regular point, then \(\exists \, \boldsymbol{\lambda}^* \in \mathbb{R}^m\) such that \[ \nabla_\mathbf{x} \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*) = \mathbf{0}, \quad \nabla_{\boldsymbol{\lambda}} \mathcal{L}(\mathbf{x}^* , \boldsymbol{\lambda}^*) = \mathbf{0} \]
Any \(\mathbf{x}^*\) for which \(\exists \, \boldsymbol{\lambda}^*\) satisfying the first order conditions is called a constrained stationary point.
Let \(\mathbf{x}^*\) be a constrained stationary point, so there exist Lagrange multipliers \(\boldsymbol{\lambda}^*\) such that
\begin{aligned} \nabla_\mathbf{x} \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*) &= \nabla f(\mathbf{x}^*) + A(\mathbf{x}^*) \boldsymbol{\lambda}^* = \mathbf{0} \\ \nabla_{\boldsymbol{\lambda}} \mathcal{L}(\mathbf{x}^*, \boldsymbol{\lambda}^*) &= \mathbf{c}(\mathbf{x}^*) = \mathbf{0} \end{aligned}
If \(W_Z^*\) is positive definite \(\implies \mathbf{x}^*\) is a strict local minimiser.
Here, \[ A(\mathbf{x}^*) = \begin{bmatrix} \nabla c_1(\mathbf{x}^*), \dots, \nabla c_{m}(\mathbf{x}^*) \end{bmatrix}\] \[ W_Z^* := (Z^*)^\top \nabla^{2}_\mathbf{x} \mathcal{L}(\mathbf{x}^*,\boldsymbol{\lambda}^*)Z^* \] \[ Z^* \in \mathbb{R}^{n \times (n-t^*)}, \quad t^* = \mathrm{rank}(A(\mathbf{x}^*)) \] \[ (Z^*)^\top A(\mathbf{x}^*) = \mathbf{0} \]
where \(W_Z^*\) is the reduced Hessian of the Lagrangian, and that in turn can be thought of as the projection of the Lagrangian’s Hessian onto the tangent space of the constraints at the point \(\mathbf{x}^*\)
Inequality Constraints
\begin{aligned} \operatorname*{minimise}_{\mathbf{x} \in \Omega} &f(\mathbf{x}) \\ \text{subject to} &\mathbf{c}_i (\mathbf{x}) = 0, \quad i = 1, \dots, m_E \\ &\mathbf{c}_i (\mathbf{x}) \leq 0, \quad i = m_E + 1, \dots, m \end{aligned}
The problem is a standard form convex optimisation problem if the objective function \(f\) is convex on the feasible set, \(\mathbf{c}_i\) is affine for each \(i \in \mathcal{E}\), and \(\mathbf{c}_i\) is convex for each \(i \in \mathcal{I}\).
The set of active constraints at a feasible point \(\mathbf{x}\) is \[ \mathcal{A}(\mathbf{x}) = \{ i \in \{1, \dots, m\} : \mathbf{c}_i (\mathbf{x}) = 0\} \]
Feasible \(\mathbf{x}^*\) such that \(\{\nabla c_i(\mathbf{x}^*) : i \in \mathcal{A}(\mathbf{x}^*)\}\) are linearly independent.
Feasible \(\mathbf{x}^*\) for which \(\exists \lambda_i^*\) for \(i \in \mathcal{A}(\mathbf{x}^*)\) with \[ \nabla f(\mathbf{x}^*) + \sum_{i \in \mathcal{A}(\mathbf{x}^*)} \lambda_i^* \nabla c_{i}(\mathbf{x}^*) = \mathbf{0} \]
If \(\mathbf{x}^*\) is a local minimiser and a regular point, then \(\exists \lambda_i^*\) (\(i \in \mathcal{A}(\mathbf{x}^*)\)) such that \[ \nabla f(\mathbf{x}^*) + \sum_{i \in \mathcal{A}(\mathbf{x}^*)} \lambda_i^* \nabla c_{i}(\mathbf{x}^*) = \mathbf{0}, \] with \(c_{i}(\mathbf{x}^*)=0\) \((i \in \mathcal{E})\), \(c_{i}(\mathbf{x}^*) \leq 0\) \((i \in \mathcal{I})\), $ λ_i^* ≥ 0$ \((i \in \mathcal{I})\), and \(\lambda_i^*=0\) for \(i \notin \mathcal{A}(\mathbf{x}^*)\).
KKT generalises Lagrange Multipliers from just equality constraints, to both equality and inequality constraints.
Let \(t^*=\lvert\mathcal{A}(\mathbf{x}^*)\rvert\) and \[ \mathcal{A}^*=[\nabla c_{i}(\mathbf{x}^*) \mid i \in \mathcal{A}(\mathbf{x}^*)] \] If \(t^* < n\) and \(\mathcal{A}^*\) has full rank, let \(Z^* \in \mathbb{R}^{n \times (n-t^*)}\) with \((Z^*)^\top \mathcal{A}^*=0\).
Define \[ W^* = \nabla^2 f(\mathbf{x}^*) + \sum_{i \in \mathcal{A}(\mathbf{x}^*)} \lambda_i^* \nabla^2 c_{i}(\mathbf{x}^*), \quad W_Z^* = (Z^*)^\top W^* Z^* \] If \(\lambda_i^* > 0\) for all \(i \in \mathcal{I} \cap \mathcal{A}(\mathbf{x}^*)\) and \(W_Z^* \succeq 0\), then \(\mathbf{x}^*\) is a strict local minimiser.
If the problem is convex and \(\mathbf{x}^*\) satisfies the KKT conditions with \(\lambda_i^* \geq 0\) for all \(i \in \mathcal{I} \cap \mathcal{A}(\mathbf{x}^*)\), then \(\mathbf{x}^*\) is a global minimiser.
\begin{aligned} \max_{\mathbf{y} \in \mathbb{R}^n, \, \boldsymbol{\lambda} \in \mathbb{R}^m} &f(\mathbf{y}) + \sum_{i=1}^m \lambda_i c_{i}(\mathbf{y}) \\ \text{s.t.} &\nabla f(\mathbf{y}) + \sum_{i=1}^m \lambda_i \nabla c_i(\mathbf{y}) = 0 \\ &\lambda_i > 0 \quad (i \in \mathcal{I}) \end{aligned}
If \(\mathbf{x}\) is primal feasible and \((\mathbf{y}, \boldsymbol{\lambda})\) is dual feasible, then: \[f(\mathbf{x}) \geq f(\mathbf{y}) + \sum_{i=1}^m \lambda_i c_{i}(\mathbf{y})\]
Under suitable constraint qualifications (e.g., Slater’s condition), there exist primal-dual feasible points \(\mathbf{x}^*\) and \((\mathbf{y}^*, \boldsymbol{\lambda}^*)\) such that: \[ f(\mathbf{x}^*) = f(\mathbf{y}^*) + \sum_{i=1}^m \lambda_i^* c_{i}(\mathbf{y}^*) \]
Numerical Methods (unconstrained)
If \(\mathbf{x}_k \to \mathbf{x}^*\) and \[ \frac{\|\mathbf{x}_{k+1}-\mathbf{x}^*\|}{\|\mathbf{x}_k- \mathbf{x}^*\|^\alpha} \to \beta \]
as \(k \to \infty\), the method has $α$-th order convergence. Key cases:
- \(\alpha=1\) (linear),
- \(\alpha=1\) with \(\beta=0\) (superlinear),
- \(\alpha=2\) (quadratic).
Properties:
- Descent direction: \((\mathbf{g}^{(k)})^\top \mathbf{s}^{(k)} < 0\)
- Exact line search: \(\nabla f(\mathbf{x}^{(k+1)})^\top \mathbf{s}^{(k)} = 0\)
- Global convergence: converges to stationary point from any \(\mathbf{x}^{(1)}\)
- Quadratic termination: finds minimiser of strictly convex quadratic in finite iterations
Properties:
- Descent direction: Yes
- Global convergence: Yes
- Quadratic termination: No
- Rate: Linear. For strictly convex quadratic: \(\|\mathbf{x}^{(k+1)} - \mathbf{x}^*\| \le \left(\frac{\kappa-1}{\kappa+1}\right)^k \|\mathbf{x}^{(1)} - \mathbf{x}^*\|\) where \(\kappa\) is condition number of \(\nabla^2 f\)
Properties:
- Descent direction: Yes, if \(G^{(k)}\) positive definite 𐃏
- Global convergence: No (Hessian may be singular)
- Quadratic termination: Yes (one iteration for strictly convex quadratics)
- Rate: Quadratic if \(G^*\) positive definite
- Usage: When Hessian can be evaluated and is positive definite
Properties:
- Descent direction: Yes
- Quadratic termination: Yes with exact line searches
- Usage: Large \(n\); stores only vectors, avoids solving linear systems
Penalty Methods
\[ \min_{\mathbf{x} \in \mathbb{R}^n} ( f(\mathbf{x}) + \mu P(\mathbf{x}) ) \] where \[ P(\mathbf{x}) = \sum_{i=1}^{m_E} [c_i(\mathbf{x})]^2 + \sum_{i=m_E + 1}^{m} [c_i(\mathbf{x})]^{2}_+ \] and $ [c_i(\mathbf{x})]_+ = max\{c_i(\mathbf{x}),0\} $
- \(c: \mathbb{R}^n \to \mathbb{R}\) is a convex function \(\implies \max\{\mathbf{c}(\mathbf{x}),0\}^2\) is a convex function
- \[\frac{\partial}{\partial x_i} [\max\{\mathbf{c}(\mathbf{x}),0\}]^2 = 2 \max\{\mathbf{c}(\mathbf{x}), 0\}\frac{\partial}{\partial x_i}\]
For each \(\mu>0\) let \(\mathbf{x}_{\mu}\) minimise the penalty problem and set \(\theta(\mu)=f(\mathbf{x}_{\mu})+\mu P(\mathbf{x}_{\mu})\). Suppose \(\{\mathbf{x}_{\mu}\}\) lies in a closed bounded set. Then \[ \min_{\mathbf{x}} \{ f(\mathbf{x}) : c_{i}(\mathbf{x})=0, \, i \in \mathcal{E}, \; c_{i}(\mathbf{x}) \leq 0, i \in \mathcal{I} \} = \lim_{\mu \to \infty} \theta(\mu). \] Moreover, any cluster point \(\mathbf{x}^*\) of \(\{\mathbf{x}_{\mu}\}\) solves the original problem, and \(\mu P(\mathbf{x}_{\mu}) \to 0\) as \(\mu \to \infty\).
Optimal Control Theory
\begin{aligned} \min_{\mathbf{u}(t) \in U} \int_{t_0}^{t_1} &f_0 (\mathbf{x}(t), \mathbf{u}(t) ) \, dt \\ \text{subject to} \quad \dot{\mathbf{x}}(t) &= \mathbf{f} (\mathbf{x}(t), \mathbf{u}(t) ) \\ \mathbf{x}(t_0) &= \mathbf{x}_0 \\ \mathbf{x}(t_1) &= \mathbf{x}_1 . \end{aligned}
\[ H(\mathbf{x}, \mathbf{\hat{z}}, \mathbf{u}) = \mathbf{\hat{z}}^\top \dot{\mathbf{\hat{x}}} = \sum_{i = 0}^n z_i (t) f_i (\mathbf{x}(t), \mathbf{u}(t) ),\]
where $ \mathbf{\hat{z}}(t)=\begin{pmatrix} z_0(t), …, z_n(t) \end{pmatrix}^\top $ and $ \mathbf{\hat{x}}(t)=\begin{pmatrix} x_0(t), …, x_n(t) \end{pmatrix}^\top $ and $ x_0(t)=f_0(\mathbf{x}(t),\mathbf{u}(t)), \ x_0(t_0)=0 $
$ \dot{\hat{z}} = -\frac{\partial H}{∂ \hat{x}} $
Suppose \((\mathbf{x}^*,\mathbf{u}^*)\) is optimal. Then
- \(z_0 = -1\) (normal case), so \[ H( \mathbf{x},\mathbf{\hat{z}},\mathbf{u} ) = -f_0( \mathbf{x}(t), \mathbf{u}(t) ) + \sum_{i=1}^n z_i(t) f_i ( \mathbf{x}(t),\mathbf{u}(t) ). \]
- Co-state equations admit a solution \(\hat{\mathbf{z}}^*\).
- \(\mathbf{u}^*\) maximises \(H(\mathbf{x}^*,\mathbf{\hat{z}}^*,\mathbf{u})\) over \(u \in \mathcal{U}\).
- \(\mathbf{x}^*\) satisfies state equation with \(\mathbf{x}^* (t_0)=x_0\), \(\mathbf{x}^* (t_1)=x_1\).
- The Hamiltonian is constant along the optimal path and equals \(0\) if \(t_1\) is free.
Even if the problems are not autonomous or fixed, we can still convert them into autonomous, fixed target problems:
Partially free targets: If target is intersection of \(k\) surfaces \(\mathbf{g}_i(x^1)=0\), \(i=1,\dots,k\), then the transversality condition is \(z_1 = \sum_{i=1}^k c_i \nabla g_{i}(\mathbf{x}^1)\) for some constants \(c_i\), where \(\mathbf{z}(t^1) = \mathbf{z}^1\).
Completely free target: If \(x(t_1)\) is unrestricted, then \(z(t_1)=0\).
Non-autonomous problems: For state \(\dot{\mathbf{x}} = \mathbf{f}(\mathbf{x},\mathbf{u},t)\) and cost \(J=\int_{t_0}^{t_1} f_0(\mathbf{x},\mathbf{u},t) \, dt\) introduce extra state \(x_{n+1}\) with \[ \dot{x}_{n+1} = 1, \quad x_{n+1}(t_0)=t_0, \quad x_{n+1}(t_1)=t_1 \]