Feedforward Deep Neural Networks

This page includes{{< mnote “in the Hugo sense of the word” >}} my Chapter notes for the book by Michael Nielsen.

chapter 4: a visual proof that neural nets can compute any function

  • one of the most striking facts about neural networks is that they can compute any function. 𐃏
  • we will always be able to do better than some given error \(\epsilon\)
  • what’s even crazier is that this universality holds even if we restrict our networks to just have a single layer intermediate between the input and output neurons:
  • one of the original papers publishing this result leveraged the Hahn-Banach Theorem, the Riesz Representation theorem and some Fourier Analysis!

    Read more >

chapter 5: why are deep neural networks hard to train?

  • given the findings of the previous chapter (universality), why would we concern ourselves with learning deep neural nets?
    • especially given that we are guaranteed to be able to approximate any function with just a single layer of hidden neurons?

well, just because something is possible, it doesn’t mean it’s a good idea!

considering that we are using computers, it’s usually a good idea to break the problem down into smaller sub-problems, solve those, and then come back to solve the main problem.

Read more >

chapter 6: deep learning

{{< tikz >}}

\begin{tikzpicture}[x=1.6cm,y=1.1cm] \large \message{^^JDeep convolution neural network}

% Define colors \colorlet{myred}{red!80!black} \colorlet{myblue}{blue!80!black} \colorlet{mygreen}{green!60!black} \colorlet{myorange}{orange!70!red!60!black} \colorlet{mydarkred}{red!30!black} \colorlet{mydarkblue}{blue!40!black} \colorlet{mydarkgreen}{green!30!black}

% Define TikZ styles \tikzset{ >=latex, % for default LaTeX arrow head node/.style={thick,circle,draw=myblue,minimum size=22,inner sep=0.5,outer sep=0.6}, node in/.style={node,green!20!black,draw=mygreen!30!black,fill=mygreen!25}, node hidden/.style={node,blue!20!black,draw=myblue!30!black,fill=myblue!20}, node convol/.style={node,orange!20!black,draw=myorange!30!black,fill=myorange!20}, node out/.style={node,red!20!black,draw=myred!30!black,fill=myred!20}, connect/.style={thick,mydarkblue}, %,line cap=round connect arrow/.style={-{Latex[length=4,width=3.5]},thick,mydarkblue,shorten <=0.5,shorten >=1} }

% Define layers and nodes \def\layerNodes{{5,5,4,3,2,4,4,3}} % Number of nodes in each layer \def\NC{6} % number of convolutional layers \def\totalLayers{8} % total number of layers

Read more >

chapter 3: improving the way neural networks learn

3.1 the cross entropy function

  • we often learn fastest when we’re badly wrong about something
  • the cross-entropy cost function is always negative (which is something you desire for a cost function)

\begin{equation} \label{eq:neuron_ce} C = -\frac{1}{n}\sum_x [y \ln a + (1-y)\ln(1-a)] \end{equation}

  • note here that at a = 1, you’ll get nan. we handle this in the code below.
  • this cost tends towards zero as the neuron gets better at computing the desired output y
  • it also punishes bad guesses more harshly.
  • the cross-entropy is nearly always the better choice, provided the output neurons are sigmoid neurons
  • if the output neurons however are linear neurons, then the quadratic cost will not cause learning slowdown. you may use it.
  • to find the learning rate \(\eta\) for log-reg, you can divide that of the lin-reg by 6.
  • ch1 = 95.42 accuracy
  • 100 hidden neurons \(\implies\) 96.82 percent.
    • eliminated one in fourteen errors; pretty good!
  • neuron saturation is an important problem in neural nets.
  • cross-entropy is a measure of surprise
    • ch5 Cover & Thomas
  • a softmax output layer with log-likelihood cost is quite similar to a sigmoid output layer with cross-entropy cost.
  • softmax plus log-likelihood is worth using whenever you want to interpret the output activations as probabilities.

3.2 overfitting and regularisation

chapter 2: how the backpropagation algorithm works

  • the algorithm was introduced in the 1970s, but its importance wasn’t fully appreciated until the famous 1986 paper by David Rumelhart, Geoffrey Hinton, and Ronald Williams.
  • “workhorse of learning in neural networks”
  • at the heart of it is an expression that tells us how quickly the cost function changes when we change the weights and biases.
activation diagram of a single neuron in matrix notation

activation diagram of a single neuron in matrix notation

Read more >

chapter 1: using neural networks to recognise handwritten digits

notes
  • insight is forever
  • his code is written in python 2.7
  • emotional commitment is a key to achieving mastery
The visual cortex is located in the occipital lobe

The visual cortex is located in the occipital lobe

  • primary visual cortex has 140 million neurons
  • two types of artificial neuron: perceptron, sigmoid neuron
  • perceptron takes binary inputs and produces a single binary output.
  • perceptrons should be considered as making decisions after weighing up evidence (inputs)
  • neural nets can express NAND, which means any computation can be built using these gates!
sigmoid neurons
  • you want to tweak the weights and biases such that small changes in either will produce a small change in the output
  • as such we must break free from the sgn step function and introduce the sigmoid function
Binary, Discontinuous Sign

Binary, Discontinuous Sign

Read more >