Skip to content
Snippets Groups Projects
Commit f6d0940d authored by Nando Farchmin's avatar Nando Farchmin
Browse files

Add doc for NNs

parent 800f6308
No related branches found
No related tags found
No related merge requests found
doc/argmax.png

72.9 KiB

Neural Networks 101
===================
<div style="text-align: center;">
<img src="machine_learning.png" title="ml" alt="ml" height=400 />
</div>
Nomenclature and Definitions
----------------------------
First, we need to clarify a few terms: **artificial intelligence**, **machine learning** and **neural network**.
Everybody categorizes them differently, but we look at it this way:
<br/>
<div style="text-align: center;">
<img src="venn.png" title="venn" alt="venn" height=300 />
</div>
<br/>
Here we focus on neural networks as a special model class used for function approximation in regression or classification tasks.
To be more precise, we will rely on the following definition.
> **Definition** (Neural Network):
> For any $L\in\mathbb{N}$ and $d=(d_0,\dots,d_L)\in\mathbb{N}^{L+1}$ a non-linear map $\Psi\colon\mathbb{R}^{d_0}\to\mathbb{R}^{d_L}$ of the form
> $$
> \Psi(x) = \bigl[\varphi_L\circ (W_L\bullet + b_L)\circ\varphi_{L-1}\circ\dots\circ(W_2\bullet + b_2)\circ\varphi_1\circ (W_1\bullet + b_1)\bigr](x)
> $$
> is called a _fully connected feed-forward neural network_.
Typically, we use the following nomenclature:
- $L$ is called the _depth_ of the network with layers $\ell=0,\dots,L$.
- $d$ is called the _width_ of the network, where $d_\ell$ is the widths of the layers $\ell$.
- $W_\ell\in\mathbb{R}^{d_{\ell-1}\times d_\ell}$ are the _weights_ of layer $\ell$.
- $b_\ell\in\mathbb{R}^{d_\ell}$ is the _biases_ of layer $\ell$.
- $\vartheta=(W_1,b_1,\dots,W_L,b_L)$ are the _free parameters_ of the neural network.
Sometimes we write $\Psi_\vartheta$ or $\Psi(x; \vartheta)$ to indicate the dependence of $\Psi$ on the parameters $\vartheta$.
- $\varphi_\ell$ is the _activation function_ of layer $\ell$.
Note that $\varphi_\ell$ has to be non-linear and monotone increasing.
Additionally, there exist the following conventions:
- $x^{(0)}:=x$ is called the _input (layer)_ of the neural network $\Psi$.
- $x^{(L)}:=\Psi(x)$ is called the _output (layer)_ of the neural network $\Psi$.
- Intermediate results $x^{(\ell)} = \varphi_\ell(W_\ell\, x^{(\ell-1)} + b_\ell)$ are called _hidden layers_.
- (debatable) A neural network is called _shallow_ if it has only one hidden layer ($L=2$) and deep otherwise.
**Example:**
Let $L=3$, $d=(6, 10, 10, 3)$ and $\varphi_1=\varphi_2=\varphi_3=\mathrm{ReLU}$.
Then the neural network is given by the concatenation
$$
\Psi\colon \mathbb{R}^6\to\mathbb{R}^3,
\qquad
\Psi(x) = \varphi_3\Bigl(W_3 \Bigl(\underbrace{\varphi_2\bigl(W_2 \bigl(\underbrace{\varphi_1(W_1 x + b_1)}_{x^{(1)}}\bigr) + b_2\bigr)}_{x^{(2)}}\Bigr) + b_3\Bigr).
$$
A typical graphical representation of the neural network looks like this:
<br/>
<div style="text-align: center;">
<img src="nn_fc_example.png" title="ml" alt="ml" width=400 />
</div>
<br/>
The entries of $W_\ell$, $\ell=1,2,3$, are depicted as lines connecting nodes in one layer to the subsequent one.
the color indicates the sign of the entries (blue = "+", magenta = "-") and the opacity represents the absolute value (magnitude) of the values.
Note that neither the employed actication functions $\varphi_\ell$ nor the biases $b_\ell$ are represented in this graph.
Activation Functions
--------------------
Activation functions can, in principle, be arbitrary non-linear maps.
The important part is the non-linearity, as otherwise the neural network would be simply be an affine function.
Typical examples of continuous activation functions applied in the context of function approximation or regression are:
ReLU | Leaky ReLU | Sigmoid
- | - | -
<img src="relu.png" title="ReLU" alt="ReLU" width=300 /> | <img src="leaky_relu.png" title="leaky ReLU" alt="leaky ReLU" width=300 /> | <img src="tanh.png" title="tanh" alt="tanh" width=300 />
For classification tasks, such as image recognition, so called convolutional neural networks (CNNs) are employed.
Typically, these networks use different types of activation functions, such as:
**Examples for discrete activation functions:**
Argmax | Softmax | Max-Pooling
- | - | -
<img src="argmax.png" title="argmax" alt="argmax" width=300 /> | <img src="softmax.png" title="softmax" alt="softmax" width=300 /> | <img src="maxpool.png" title="maxpool" alt="maxpool" width=300 />
More infos on CNNs follow below.
Training
--------
Types of Neural Networks
------------------------
Fully Connected Neural Network| Convolutional Neural Network
- | -
![bla](nn_fc.png) | ![conv](nn_conv.png)
Further Reading
---------------
- Python: PyTorch, TensorFlow, Scikit learn
- Matlab: Deeplearning Toolbox
doc/leaky_relu.png

61.8 KiB

doc/machine_learning.png

31.2 KiB

doc/maxpool.png

30.5 KiB

doc/nn_conv.png

23.7 KiB

doc/nn_fc.png

883 KiB

doc/nn_fc_example.png

172 KiB

doc/relu.png

71.6 KiB

doc/softmax.png

114 KiB

doc/tanh.png

77.4 KiB

doc/venn.png

111 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment