diff --git a/doc/argmax.png b/doc/argmax.png
new file mode 100644
index 0000000000000000000000000000000000000000..39b540783076a4b9817d258a3f04bc8f00760030
Binary files /dev/null and b/doc/argmax.png differ
diff --git a/doc/basics.md b/doc/basics.md
new file mode 100644
index 0000000000000000000000000000000000000000..e6688f48a747c03cc79065db34dd7c6268b1826a
--- /dev/null
+++ b/doc/basics.md
@@ -0,0 +1,102 @@
+Neural Networks 101
+===================
+
+<div style="text-align: center;">
+    <img src="machine_learning.png" title="ml" alt="ml" height=400 />
+</div>
+
+Nomenclature and Definitions
+----------------------------
+
+First, we need to clarify a few terms: **artificial intelligence**, **machine learning** and **neural network**.
+Everybody categorizes them differently, but we look at it this way:
+
+<br/>
+<div style="text-align: center;">
+    <img src="venn.png" title="venn" alt="venn" height=300 />
+</div>
+<br/>
+
+Here we focus on neural networks as a special model class used for function approximation in regression or classification tasks.
+To be more precise, we will rely on the following definition.
+
+> **Definition** (Neural Network):
+> For any $L\in\mathbb{N}$ and $d=(d_0,\dots,d_L)\in\mathbb{N}^{L+1}$ a non-linear map $\Psi\colon\mathbb{R}^{d_0}\to\mathbb{R}^{d_L}$ of the form
+> $$
+> \Psi(x) = \bigl[\varphi_L\circ (W_L\bullet  + b_L)\circ\varphi_{L-1}\circ\dots\circ(W_2\bullet  + b_2)\circ\varphi_1\circ (W_1\bullet  + b_1)\bigr](x)
+> $$
+> is called a _fully connected feed-forward neural network_.
+
+Typically, we use the following nomenclature:
+- $L$ is called the _depth_ of the network with layers $\ell=0,\dots,L$.
+- $d$ is called the _width_ of the network, where $d_\ell$ is the widths of the layers $\ell$.
+- $W_\ell\in\mathbb{R}^{d_{\ell-1}\times d_\ell}$ are the _weights_ of layer $\ell$.
+- $b_\ell\in\mathbb{R}^{d_\ell}$ is the _biases_ of layer $\ell$.
+- $\vartheta=(W_1,b_1,\dots,W_L,b_L)$ are the _free parameters_ of the neural network.
+  Sometimes we write $\Psi_\vartheta$ or $\Psi(x; \vartheta)$ to indicate the dependence of $\Psi$ on the parameters $\vartheta$.
+- $\varphi_\ell$ is the _activation function_ of layer $\ell$.
+  Note that $\varphi_\ell$ has to be non-linear and monotone increasing.
+
+Additionally, there exist the following conventions:
+- $x^{(0)}:=x$ is called the _input (layer)_ of the neural network $\Psi$.
+- $x^{(L)}:=\Psi(x)$ is called the _output (layer)_ of the neural network $\Psi$.
+- Intermediate results $x^{(\ell)} = \varphi_\ell(W_\ell\, x^{(\ell-1)} + b_\ell)$ are called _hidden layers_.
+- (debatable) A neural network is called _shallow_ if it has only one hidden layer ($L=2$) and deep otherwise.
+
+**Example:**
+Let $L=3$, $d=(6, 10, 10, 3)$ and $\varphi_1=\varphi_2=\varphi_3=\mathrm{ReLU}$.
+Then the neural network is given by the concatenation
+$$
+\Psi\colon \mathbb{R}^6\to\mathbb{R}^3,
+\qquad
+\Psi(x) = \varphi_3\Bigl(W_3 \Bigl(\underbrace{\varphi_2\bigl(W_2 \bigl(\underbrace{\varphi_1(W_1 x + b_1)}_{x^{(1)}}\bigr) + b_2\bigr)}_{x^{(2)}}\Bigr) + b_3\Bigr).
+$$
+A typical graphical representation of the neural network looks like this:
+
+<br/>
+<div style="text-align: center;">
+  <img src="nn_fc_example.png" title="ml" alt="ml" width=400 />
+</div>
+<br/>
+
+The entries of $W_\ell$, $\ell=1,2,3$, are depicted as lines connecting nodes in one layer to the subsequent one.
+the color indicates the sign of the entries (blue = "+", magenta = "-") and the opacity represents the absolute value (magnitude) of the values.
+Note that neither the employed actication functions $\varphi_\ell$ nor the biases $b_\ell$ are represented in this graph.
+
+Activation Functions
+--------------------
+
+Activation functions can, in principle, be arbitrary non-linear maps.
+The important part is the non-linearity, as otherwise the neural network would be simply be an affine function.
+
+Typical examples of continuous activation functions applied in the context of function approximation or regression are:
+
+ReLU | Leaky ReLU | Sigmoid
+- | - | -
+<img src="relu.png" title="ReLU" alt="ReLU" width=300 /> | <img src="leaky_relu.png" title="leaky ReLU" alt="leaky ReLU" width=300 /> | <img src="tanh.png" title="tanh" alt="tanh" width=300 />
+
+For classification tasks, such as image recognition, so called convolutional neural networks (CNNs) are employed.
+Typically, these networks use different types of activation functions, such as:
+
+**Examples for discrete activation functions:**
+Argmax | Softmax | Max-Pooling
+- | - | -
+<img src="argmax.png" title="argmax" alt="argmax" width=300 /> | <img src="softmax.png" title="softmax" alt="softmax" width=300 /> | <img src="maxpool.png" title="maxpool" alt="maxpool" width=300 />
+
+More infos on CNNs follow below.
+
+Training
+--------
+
+Types of Neural Networks
+------------------------
+
+Fully Connected Neural Network| Convolutional Neural Network
+- | -
+![bla](nn_fc.png) | ![conv](nn_conv.png)
+
+Further Reading
+---------------
+
+- Python: PyTorch, TensorFlow, Scikit learn
+- Matlab: Deeplearning Toolbox
diff --git a/doc/leaky_relu.png b/doc/leaky_relu.png
new file mode 100644
index 0000000000000000000000000000000000000000..1e6dd0a96f1e9e480751aa4f96263f7b2dd60455
Binary files /dev/null and b/doc/leaky_relu.png differ
diff --git a/doc/machine_learning.png b/doc/machine_learning.png
new file mode 100644
index 0000000000000000000000000000000000000000..ee18d536945aeffa4065a1467b0b842115a062f4
Binary files /dev/null and b/doc/machine_learning.png differ
diff --git a/doc/maxpool.png b/doc/maxpool.png
new file mode 100644
index 0000000000000000000000000000000000000000..6de8134d816577305eef08d3ac8e1db6a469fb28
Binary files /dev/null and b/doc/maxpool.png differ
diff --git a/doc/nn_conv.png b/doc/nn_conv.png
new file mode 100644
index 0000000000000000000000000000000000000000..da778da67906c626bfd1e2ff17948564ba3de774
Binary files /dev/null and b/doc/nn_conv.png differ
diff --git a/doc/nn_fc.png b/doc/nn_fc.png
new file mode 100644
index 0000000000000000000000000000000000000000..e36e749e08e8eee066927472e7e00ab7a32a8378
Binary files /dev/null and b/doc/nn_fc.png differ
diff --git a/doc/nn_fc_example.png b/doc/nn_fc_example.png
new file mode 100644
index 0000000000000000000000000000000000000000..4ba4f7f6180cd129334c5c859d9a0c587d406efb
Binary files /dev/null and b/doc/nn_fc_example.png differ
diff --git a/doc/relu.png b/doc/relu.png
new file mode 100644
index 0000000000000000000000000000000000000000..823848cf072c2f840820f030bab7444d7daa2312
Binary files /dev/null and b/doc/relu.png differ
diff --git a/doc/softmax.png b/doc/softmax.png
new file mode 100644
index 0000000000000000000000000000000000000000..1efdf56dd12e1f07a1ad742d5f015eebbf3b96db
Binary files /dev/null and b/doc/softmax.png differ
diff --git a/doc/tanh.png b/doc/tanh.png
new file mode 100644
index 0000000000000000000000000000000000000000..8938bf3a29b768a8df06747127e3d906ba20ca24
Binary files /dev/null and b/doc/tanh.png differ
diff --git a/doc/venn.png b/doc/venn.png
new file mode 100644
index 0000000000000000000000000000000000000000..a55784aa794b708de9c1ac07b62097ad9b2ee34a
Binary files /dev/null and b/doc/venn.png differ