Computer Vision

I think taking a course in a subject that you are interested in is never particularly a bad thing.

This page will exclusively contain what I was taught in COMP9517 for a short while. That is just how it will be.

However, eventually we will morph out of this into the textbook, and subsequently out of that too.

For now, I just want a document to type in that I enjoy typing in!

Notes

in each of the below topics, there are a number of algorithms and facts that I must know for the final exam.

Image Formation

pinhole camera model
projective geometry
- vanishing lines and points
- mathematics: \[x' = -x \frac{f}{z} \] \[ y' = -y \frac{f}{z} \]
colour spaces:
- RGB: red green blue
  - strongly correlated channels
- HSV: hue saturation value
  - confounded channels
- YCbCr: stand for ??
  - fast to compute, good for compression. the modern day standard
  - Y = luminance
  - Cb = blue colour difference
  - Cr = red colour difference
- L*a*b*: differences in luminance are more perceptually uniform
  - L* = lightness
quantisation: digitises the image intensity / amplitude values

Image Processing

spatial domain

point operations

\[T: \mathbb{R} \rightarrow \mathbb{R}\quad g(x,y) =T(f(x,y))\]

contrast stretching
intensity thresholding
- automatic: otsu, isodata, multilevel
intensity inversion
log transformation
power transformation
piecewise linear transformation
piecewise contrast stretching
gray-level slicing
bit-plane slicing
histogram of pixel values
histogram-based thresholding "triangle"
histogram equalisation
- continuous; discrete
- constrained
histogram matching
- continuous, discrete
arithmetic and logical operations
averaging

neighourhood operations

\[T: \mathbb{R^n} \rightarrow \mathbb{R}\quad g(x,y) = T(f(x,y),f(x+1,y),f(x-1,y),...\]

convolution
- linear, shift-invariant
- properties:
  - commutativity \[f_1 * f_2 = f_2 * f_1 \]
  - associativity \[f_1 * (f_2 * f_3) = (f_1 * f_2) * f_3 \]
  - distributivity \[f_1 * (f_2 * f_3) = f_1 * f_2 + f_1 * f_3\]
  - multiplicativity \[a(f_1*f_2) = (a f_1) * f_2 = f_1 * (a f_2) \]
  - derivation \[(f_1 * f_2)' = f_1'*f_2 = f_1*f_2' \]
  - theorem \[f_1 * f_2 \iff \hat{f_1} \hat{f_2}\] convolution in spatial domain amounts to multiplication in spectral domain
spatial filtering
linear shift-invariant operations
border problem
- padding: add more pixels with value 0
- clamping: repeat all border pixel values indefinitely
- wrapping: copy pixel values from opposite sides
- mirroring: reflect pixel values across borders

filtering methods

uniform filter
- smoothing
gaussian filter
- separable and circularly symmetric; the only such filter
- optimal joint localisation in spatial and frequency domain
- fourier transform (ft henceforth) is also a Gaussian
- n-fold convolution of any low-pass filter converges to a Gaussian
- infinitely smooth, so infinite derivatives exist
- good at keeping small objects (better than median). it is a smoothing filter.
median filter
- order-statistic filter
- sorts, then takes median
- can eliminate salt and pepper noise (which are just isolated intensity spikes)
- nonlinear filter
- better than gaussian at removing small objects
smoothing
- image blurring, noise reduction
differentiation
- forward, backward, central difference (finite differences because images are discrete)
separability
- improves computation efficiency
- examples: uniform, prewitt, sobel, gauss
pooling
- max / min / average
- makes image smaller
- combines filtering and downsampling in one operation

image enhancement

sharpening
- subtract Gaussian filtered from image, then add the produced "high-frequencies" back into the image.
- can also use the laplacean: $\nabla^2 f = f_{xx} + f_{yy} $ by subtracting it from the original image: \[f(x,y) - \nabla^2 f(x,y) \]
unsharp masking
- ?
gradient vector & magnitude
- \[\nabla f(x,y) = [f_x(x,y), f_y(x,y)]^T \]
- \[||\nabla f(x,y) || = \sqrt{f_x^2(x,y),f_y^2(x,y)} \]
edge detection
- use laplacean or intensity gradient

transform domain

high frequency -> rapidly changing intensities across pixels
low frequency -> large scale image structures
we process images in the frequency domain by first applying the Fourier transform

Fourier Transform

interpretations:
- frequencies correspond to patterns
- $F(0,0)$ is the total intensity over all pixels of the image
- noise (typically) corresponds to fluctuations in the highest frequencies
notation:
- $f(x)$ is the spatial input function
- $F(u)$ is the Fourier transform
- $e^{i\omega x} = \cos(\omega x) + i\sin(\omega x) $
- $\omega = 2\pi u$ is radial frequency
- $u$ is spatial frequency
forward fourier transform \[F(u) = \int^\infty_{-\infty} f(x)\; e^{\displaystyle -i 2\pi u x}\,\mathrm{d}x\]
inverse fourier transform \[f(x) = \int^\infty_{-\infty} F(u)\; e^{\displaystyle i 2\pi u x}\,\mathrm{d}u\]
properties:

Property	Spatial	Frequency
Superposition	$f_1(x) + f_2(x)$	$F_1(u) + F_2(u)$
Translation	$f(x-\Delta x)$	$F(u)e^{-i 2\pi u\Delta x}$
Convolution	$f(x)*h(x)$	$F(u)H(u)$
Correlation	$f(x) \otimes h(x)$	$F(u)H^*(u)$
Multiplication	$f(x)h(x)$	$F(u)*H(u)$
Scaling	$f(ax)$	$F(u/a)/a$
Differentiation	$f^{(n)}(x)$	$(i2\pi u)^n F(u)$

2D:
- forward fourier transform \[F(u,v) = \int^\infty_{-\infty}\int^\infty_{-\infty} f(x,y)\; e^{\displaystyle -i 2\pi (ux+vy)}\;\mathrm{d}x\,\mathrm{d}y\]
- inverse fourier transform \[f(x,y) = \int^\infty_{-\infty}\int^\infty_{-\infty} F(u,v)\; e^{\displaystyle -i 2\pi (ux+vy)}\;\mathrm{d}u\,\mathrm{d}v\]
- $f \leftrightarrow F$: fourier transform pair
- $F = R + i I$: real plus imaginary part
- $|F| = \sqrt{R^2 + I^2}$: Magnitude
- $\phi = \arctan(\frac{I}{R})$: Phase
Discrete:
- forward \[F(u,v) = \sum_{x=0}^{M-1} \sum_{y=0}^{N-1} f(x,y)\;e^{\displaystyle -i 2 \pi (\frac{ux}{M} + \frac{vy}{N})} \] for $u=0... M-1$ and $v = 0... N -1$
- inverse \[f(x,y) = \frac{1}{MN} \sum_{u=0}^{M-1} \sum_{v=0}^{N-1} F(u,v)\;e^{\displaystyle i 2\pi (\frac{ux}{M} + \frac{vy}{N})} \] for $x=0... M-1$ and $y = 0... N -1$

filtering

procedure:
1. multiply input image $f(x,y)$ by $(-1)^{x+y}$ to ensure centering $F(u,v)$
2. compute the transform $F(u,v)$ from image $f(x,y)$ using 2D DFT
3. multiply $F(u,v)$ by a centred filter $H(u,v)$ to obtain result $G(u,v)$
4. compute the inverse 2D DFT of $G(u,v)$ to obtain the spatial result $g(x,y)$
5. take the real component of $g(x,y)$ (imaginary component is zero)
6. multiply the result by $(-1)^{x+y}$ to remove the pattern introduced in step 1^^

convolution theorem (how does this relate to convolution?)

filtering in the frequency domain can be computationally more efficient
more intuitive in freq dom. i.e:
- low-pass = keep low frequencies, but attenuate ⊕ high frequencies
- high-pass = keep high freq, reduce low freq
- band-pass = keep frequencies in a given band. attenuate the rest
- take inverse to get the corresponding spatial filter

notch filtering = opposite of band-pass; attenuates a given range.
difference of Gaussians is a high-pass
gaussian filter = low-pass
image pyramids is for multi-resolution
approximation = ?
reconstruction = ?

Computer Vision

Notes

Image Formation

Image Processing

spatial domain

point operations

neighourhood operations

filtering methods

image enhancement

transform domain

Fourier Transform

filtering

convolution theorem (how does this relate to convolution?)

Feature Representation

Pattern Recognition

Image Segmentation

Deep Learning

Motion and Tracking

homework