Feedforward Deep Neural Networks

This page includes my Chapter notes for the book by Michael Nielsen.

chapter 4: a visual proof that neural nets can compute any function

  • one of the most striking facts about neural networks is that they can compute any function.
  • we will always be able to do better than some given error \(\epsilon\)
  • what's even crazier is that this universality holds even if we restrict our networks to just have a single layer intermediate between the input and output neurons:
  • one of the original papers publishing this result leveraged the Hahn-Banach Theorem, the Riesz Representation theorem and some Fourier Analysis!
  • realise that really complicated things are actually just functions:

    Read more >

chapter 5: why are deep neural networks hard to train?

  • given the findings of the previous chapter (universality), why would we concern ourselves with learning deep neural nets?

    • especially given that we are guaranteed to be able to approximate any function with just a single layer of hidden neurons?

well, just because something is possible, it doesn't mean it's a good idea!

considering that we are using computers, it's usually a good idea to break the problem down into smaller sub-problems, solve those, and then come back to solve the main problem.

Read more >

chapter 6: deep learning

Read more >

chapter 3: improving the way neural networks learn

3.1 the cross entropy function

  • we often learn fastest when we're badly wrong about something
  • the cross-entropy cost function is always negative (which is something you desire for a cost function)
\begin{equation} \label{eq:neuron_ce} C = -\frac{1}{n}\sum_x [y \ln a + (1-y)\ln(1-a)] \end{equation}
  • note here that at a = 1, you'll get nan. we handle this in the code below.
  • this cost tends towards zero as the neuron gets better at computing the desired output y
  • it also punishes bad guesses more harshly.
  • the cross-entropy is nearly always the better choice, provided the output neurons are sigmoid neurons
  • if the output neurons however are linear neurons, then the quadratic cost will not cause learning slowdown. you may use it.
  • to find the learning rate \(\eta\) for log-reg, you can divide that of the lin-reg by 6.
  • ch1 = 95.42 accuracy
  • 100 hidden neurons \(\implies\) 96.82 percent.

    Read more >

chapter 1: using neural networks to recognise handwritten digits

notes

  • insight is forever
  • his code is written in python 2.7
  • emotional commitment is a key to achieving mastery
/projects/ml/dl/neural-nets/ch1/
primary-visual.png
The visual cortex is located in the occipital lobe
  • primary visual cortex has 140 million neurons
  • two types of artificial neuron: perceptron, sigmoid neuron
  • perceptron takes binary inputs and produces a single binary output.
  • perceptrons should be considered as making decisions after weighing up evidence (inputs)
  • neural nets can express NAND, which means any computation can be built using these gates!
/projects/ml/dl/neural-nets/ch1/
nand.svg

sigmoid neurons

  • you want to tweak the weights and biases such that small changes in either will produce a small change in the output
  • as such we must break free from the sgn step function and introduce the sigmoid function
/projects/ml/dl/neural-nets/ch1/
sgn.svg
Binary, Discontinuous Sign
\(\leadsto\)
/projects/ml/dl/neural-nets/ch1/
sig.svg
Continuous, Differentiable Sigmoid

thus the mathematics of \(\varphi\) becomes: \[\begin{align*} \sigma(z) &\equiv \cfrac{1}{1+e^{-z}}\\ &\implies \cfrac{1}{1+\text{exp}(-\sum_j w_jx_j -b)} \end{align*}\]

Read more >

chapter 2: how the backpropagation algorithm works

  • the algorithm was introduced in the 1970s, but its importance wasn't fully appreciated until the famous 1986 paper by David Rumelhart, Geoffrey Hinton, and Ronald Williams.
  • "workhorse of learning in neural networks"
  • at the heart of it is an expression that tells us how quickly the cost function changes when we change the weights and biases.
/projects/ml/dl/neural-nets/ch2/
activations.svg
activation diagram of a single neuron in matrix notation

notation

  • \(w_{jk}^l\) means the weight of the j\(^{th}\) neuron in layer \(l\) to the k\(^{th}\) neuron in the previous layer

    Read more >

Nielsen's Figures

ntfs. (note-to-future-self)

these are all svgs created in python using matplotlib.

I could not get my dirty little paws on nielsen's tikz code that he used to produce the neural net diagrams. he compiled them as pngs on his own site. he also used mathjax to typeset his mathematics.

relu

CLOSED: [2025-04-12 Sat 17:22]

  • State "DONE" from [2025-04-12 Sat 17:22]
/projects/ml/dl/neural-nets/fig/
relu.svg
relu

code

Read more >