Add doc for NNs

f6d0940d · Nando Farchmin · 800f6308 · f6d0940d · f6d0940d · f6d0940d
Commit f6d0940d authored 2 years ago by Nando Farchmin
--- a/doc/argmax.png
+++ b/doc/argmax.png
--- a/doc/basics.md
+++ b/doc/basics.md
+Neural Networks 101
+===================
+
+<div style="text-align: center;">
+    <img src="machine_learning.png" title="ml" alt="ml" height=400 />
+</div>
+
+Nomenclature and Definitions
+----------------------------
+
+First, we need to clarify a few terms: **artificial intelligence**, **machine learning** and **neural network**.
+Everybody categorizes them differently, but we look at it this way:
+
+<br/>
+<div style="text-align: center;">
+    <img src="venn.png" title="venn" alt="venn" height=300 />
+</div>
+<br/>
+
+Here we focus on neural networks as a special model class used for function approximation in regression or classification tasks.
+To be more precise, we will rely on the following definition.
+
+> **Definition** (Neural Network):
+> For any $L\in\mathbb{N}$ and $d=(d_0,\dots,d_L)\in\mathbb{N}^{L+1}$ a non-linear map $\Psi\colon\mathbb{R}^{d_0}\to\mathbb{R}^{d_L}$ of the form
+> $$
+> \Psi(x) = \bigl[\varphi_L\circ (W_L\bullet  + b_L)\circ\varphi_{L-1}\circ\dots\circ(W_2\bullet  + b_2)\circ\varphi_1\circ (W_1\bullet  + b_1)\bigr](x)
+> $$
+> is called a _fully connected feed-forward neural network_.
+
+Typically, we use the following nomenclature:
+- $L$ is called the _depth_ of the network with layers $\ell=0,\dots,L$.
+- $d$ is called the _width_ of the network, where $d_\ell$ is the widths of the layers $\ell$.
+- $W_\ell\in\mathbb{R}^{d_{\ell-1}\times d_\ell}$ are the _weights_ of layer $\ell$.
+- $b_\ell\in\mathbb{R}^{d_\ell}$ is the _biases_ of layer $\ell$.
+- $\vartheta=(W_1,b_1,\dots,W_L,b_L)$ are the _free parameters_ of the neural network.
+  Sometimes we write $\Psi_\vartheta$ or $\Psi(x; \vartheta)$ to indicate the dependence of $\Psi$ on the parameters $\vartheta$.
+- $\varphi_\ell$ is the _activation function_ of layer $\ell$.
+  Note that $\varphi_\ell$ has to be non-linear and monotone increasing.
+
+Additionally, there exist the following conventions:
+- $x^{(0)}:=x$ is called the _input (layer)_ of the neural network $\Psi$.
+- $x^{(L)}:=\Psi(x)$ is called the _output (layer)_ of the neural network $\Psi$.
+- Intermediate results $x^{(\ell)} = \varphi_\ell(W_\ell\, x^{(\ell-1)} + b_\ell)$ are called _hidden layers_.
+- (debatable) A neural network is called _shallow_ if it has only one hidden layer ($L=2$) and deep otherwise.
+
+**Example:**
+Let $L=3$, $d=(6, 10, 10, 3)$ and $\varphi_1=\varphi_2=\varphi_3=\mathrm{ReLU}$.
+Then the neural network is given by the concatenation
+$$
+\Psi\colon \mathbb{R}^6\to\mathbb{R}^3,
+\qquad
+\Psi(x) = \varphi_3\Bigl(W_3 \Bigl(\underbrace{\varphi_2\bigl(W_2 \bigl(\underbrace{\varphi_1(W_1 x + b_1)}_{x^{(1)}}\bigr) + b_2\bigr)}_{x^{(2)}}\Bigr) + b_3\Bigr).
+$$
+A typical graphical representation of the neural network looks like this:
+
+<br/>
+<div style="text-align: center;">
+  <img src="nn_fc_example.png" title="ml" alt="ml" width=400 />
+</div>
+<br/>
+
+The entries of $W_\ell$, $\ell=1,2,3$, are depicted as lines connecting nodes in one layer to the subsequent one.
+the color indicates the sign of the entries (blue = "+", magenta = "-") and the opacity represents the absolute value (magnitude) of the values.
+Note that neither the employed actication functions $\varphi_\ell$ nor the biases $b_\ell$ are represented in this graph.
+
+Activation Functions
+--------------------
+
+Activation functions can, in principle, be arbitrary non-linear maps.
+The important part is the non-linearity, as otherwise the neural network would be simply be an affine function.
+
+Typical examples of continuous activation functions applied in the context of function approximation or regression are:
+
+ReLU | Leaky ReLU | Sigmoid
+- | - | -
+<img src="relu.png" title="ReLU" alt="ReLU" width=300 /> | <img src="leaky_relu.png" title="leaky ReLU" alt="leaky ReLU" width=300 /> | <img src="tanh.png" title="tanh" alt="tanh" width=300 />
+
+For classification tasks, such as image recognition, so called convolutional neural networks (CNNs) are employed.
+Typically, these networks use different types of activation functions, such as:
+
+**Examples for discrete activation functions:**
+Argmax | Softmax | Max-Pooling
+- | - | -
+<img src="argmax.png" title="argmax" alt="argmax" width=300 /> | <img src="softmax.png" title="softmax" alt="softmax" width=300 /> | <img src="maxpool.png" title="maxpool" alt="maxpool" width=300 />
+
+More infos on CNNs follow below.
+
+Training
+--------
+
+Types of Neural Networks
+------------------------
+
+Fully Connected Neural Network| Convolutional Neural Network
+- | -
+![bla](nn_fc.png) | ![conv](nn_conv.png)
+
+Further Reading
+---------------
+
+- Python: PyTorch, TensorFlow, Scikit learn
+- Matlab: Deeplearning Toolbox
--- a/doc/leaky_relu.png
+++ b/doc/leaky_relu.png
--- a/doc/machine_learning.png
+++ b/doc/machine_learning.png
--- a/doc/maxpool.png
+++ b/doc/maxpool.png
--- a/doc/nn_conv.png
+++ b/doc/nn_conv.png
--- a/doc/nn_fc.png
+++ b/doc/nn_fc.png
--- a/doc/nn_fc_example.png
+++ b/doc/nn_fc_example.png
--- a/doc/relu.png
+++ b/doc/relu.png
--- a/doc/softmax.png
+++ b/doc/softmax.png
--- a/doc/tanh.png
+++ b/doc/tanh.png
--- a/doc/venn.png
+++ b/doc/venn.png