diff --git a/doc/argmax.png b/doc/argmax.png new file mode 100644 index 0000000000000000000000000000000000000000..39b540783076a4b9817d258a3f04bc8f00760030 Binary files /dev/null and b/doc/argmax.png differ diff --git a/doc/basics.md b/doc/basics.md new file mode 100644 index 0000000000000000000000000000000000000000..e6688f48a747c03cc79065db34dd7c6268b1826a --- /dev/null +++ b/doc/basics.md @@ -0,0 +1,102 @@ +Neural Networks 101 +=================== + +<div style="text-align: center;"> + <img src="machine_learning.png" title="ml" alt="ml" height=400 /> +</div> + +Nomenclature and Definitions +---------------------------- + +First, we need to clarify a few terms: **artificial intelligence**, **machine learning** and **neural network**. +Everybody categorizes them differently, but we look at it this way: + +<br/> +<div style="text-align: center;"> + <img src="venn.png" title="venn" alt="venn" height=300 /> +</div> +<br/> + +Here we focus on neural networks as a special model class used for function approximation in regression or classification tasks. +To be more precise, we will rely on the following definition. + +> **Definition** (Neural Network): +> For any $L\in\mathbb{N}$ and $d=(d_0,\dots,d_L)\in\mathbb{N}^{L+1}$ a non-linear map $\Psi\colon\mathbb{R}^{d_0}\to\mathbb{R}^{d_L}$ of the form +> $$ +> \Psi(x) = \bigl[\varphi_L\circ (W_L\bullet + b_L)\circ\varphi_{L-1}\circ\dots\circ(W_2\bullet + b_2)\circ\varphi_1\circ (W_1\bullet + b_1)\bigr](x) +> $$ +> is called a _fully connected feed-forward neural network_. + +Typically, we use the following nomenclature: +- $L$ is called the _depth_ of the network with layers $\ell=0,\dots,L$. +- $d$ is called the _width_ of the network, where $d_\ell$ is the widths of the layers $\ell$. +- $W_\ell\in\mathbb{R}^{d_{\ell-1}\times d_\ell}$ are the _weights_ of layer $\ell$. +- $b_\ell\in\mathbb{R}^{d_\ell}$ is the _biases_ of layer $\ell$. +- $\vartheta=(W_1,b_1,\dots,W_L,b_L)$ are the _free parameters_ of the neural network. + Sometimes we write $\Psi_\vartheta$ or $\Psi(x; \vartheta)$ to indicate the dependence of $\Psi$ on the parameters $\vartheta$. +- $\varphi_\ell$ is the _activation function_ of layer $\ell$. + Note that $\varphi_\ell$ has to be non-linear and monotone increasing. + +Additionally, there exist the following conventions: +- $x^{(0)}:=x$ is called the _input (layer)_ of the neural network $\Psi$. +- $x^{(L)}:=\Psi(x)$ is called the _output (layer)_ of the neural network $\Psi$. +- Intermediate results $x^{(\ell)} = \varphi_\ell(W_\ell\, x^{(\ell-1)} + b_\ell)$ are called _hidden layers_. +- (debatable) A neural network is called _shallow_ if it has only one hidden layer ($L=2$) and deep otherwise. + +**Example:** +Let $L=3$, $d=(6, 10, 10, 3)$ and $\varphi_1=\varphi_2=\varphi_3=\mathrm{ReLU}$. +Then the neural network is given by the concatenation +$$ +\Psi\colon \mathbb{R}^6\to\mathbb{R}^3, +\qquad +\Psi(x) = \varphi_3\Bigl(W_3 \Bigl(\underbrace{\varphi_2\bigl(W_2 \bigl(\underbrace{\varphi_1(W_1 x + b_1)}_{x^{(1)}}\bigr) + b_2\bigr)}_{x^{(2)}}\Bigr) + b_3\Bigr). +$$ +A typical graphical representation of the neural network looks like this: + +<br/> +<div style="text-align: center;"> + <img src="nn_fc_example.png" title="ml" alt="ml" width=400 /> +</div> +<br/> + +The entries of $W_\ell$, $\ell=1,2,3$, are depicted as lines connecting nodes in one layer to the subsequent one. +the color indicates the sign of the entries (blue = "+", magenta = "-") and the opacity represents the absolute value (magnitude) of the values. +Note that neither the employed actication functions $\varphi_\ell$ nor the biases $b_\ell$ are represented in this graph. + +Activation Functions +-------------------- + +Activation functions can, in principle, be arbitrary non-linear maps. +The important part is the non-linearity, as otherwise the neural network would be simply be an affine function. + +Typical examples of continuous activation functions applied in the context of function approximation or regression are: + +ReLU | Leaky ReLU | Sigmoid +- | - | - +<img src="relu.png" title="ReLU" alt="ReLU" width=300 /> | <img src="leaky_relu.png" title="leaky ReLU" alt="leaky ReLU" width=300 /> | <img src="tanh.png" title="tanh" alt="tanh" width=300 /> + +For classification tasks, such as image recognition, so called convolutional neural networks (CNNs) are employed. +Typically, these networks use different types of activation functions, such as: + +**Examples for discrete activation functions:** +Argmax | Softmax | Max-Pooling +- | - | - +<img src="argmax.png" title="argmax" alt="argmax" width=300 /> | <img src="softmax.png" title="softmax" alt="softmax" width=300 /> | <img src="maxpool.png" title="maxpool" alt="maxpool" width=300 /> + +More infos on CNNs follow below. + +Training +-------- + +Types of Neural Networks +------------------------ + +Fully Connected Neural Network| Convolutional Neural Network +- | - + |  + +Further Reading +--------------- + +- Python: PyTorch, TensorFlow, Scikit learn +- Matlab: Deeplearning Toolbox diff --git a/doc/leaky_relu.png b/doc/leaky_relu.png new file mode 100644 index 0000000000000000000000000000000000000000..1e6dd0a96f1e9e480751aa4f96263f7b2dd60455 Binary files /dev/null and b/doc/leaky_relu.png differ diff --git a/doc/machine_learning.png b/doc/machine_learning.png new file mode 100644 index 0000000000000000000000000000000000000000..ee18d536945aeffa4065a1467b0b842115a062f4 Binary files /dev/null and b/doc/machine_learning.png differ diff --git a/doc/maxpool.png b/doc/maxpool.png new file mode 100644 index 0000000000000000000000000000000000000000..6de8134d816577305eef08d3ac8e1db6a469fb28 Binary files /dev/null and b/doc/maxpool.png differ diff --git a/doc/nn_conv.png b/doc/nn_conv.png new file mode 100644 index 0000000000000000000000000000000000000000..da778da67906c626bfd1e2ff17948564ba3de774 Binary files /dev/null and b/doc/nn_conv.png differ diff --git a/doc/nn_fc.png b/doc/nn_fc.png new file mode 100644 index 0000000000000000000000000000000000000000..e36e749e08e8eee066927472e7e00ab7a32a8378 Binary files /dev/null and b/doc/nn_fc.png differ diff --git a/doc/nn_fc_example.png b/doc/nn_fc_example.png new file mode 100644 index 0000000000000000000000000000000000000000..4ba4f7f6180cd129334c5c859d9a0c587d406efb Binary files /dev/null and b/doc/nn_fc_example.png differ diff --git a/doc/relu.png b/doc/relu.png new file mode 100644 index 0000000000000000000000000000000000000000..823848cf072c2f840820f030bab7444d7daa2312 Binary files /dev/null and b/doc/relu.png differ diff --git a/doc/softmax.png b/doc/softmax.png new file mode 100644 index 0000000000000000000000000000000000000000..1efdf56dd12e1f07a1ad742d5f015eebbf3b96db Binary files /dev/null and b/doc/softmax.png differ diff --git a/doc/tanh.png b/doc/tanh.png new file mode 100644 index 0000000000000000000000000000000000000000..8938bf3a29b768a8df06747127e3d906ba20ca24 Binary files /dev/null and b/doc/tanh.png differ diff --git a/doc/venn.png b/doc/venn.png new file mode 100644 index 0000000000000000000000000000000000000000..a55784aa794b708de9c1ac07b62097ad9b2ee34a Binary files /dev/null and b/doc/venn.png differ