diff --git a/doc/tmp.md b/doc/tmp.md
index fd0183e007502421bac1791590b25d3c93e71b3b..b626386006e6a4e0913592ad978d6961b74a13bb 100644
--- a/doc/tmp.md
+++ b/doc/tmp.md
@@ -1,100 +1,101 @@
-The empirical regression problem then reads
-
-```math
-\text{Find}\qquad \Psi_\vartheta
-= \operatorname*{arg\, min}_{\Psi_\theta\in\mathcal{M}_{d,\varphi}} \frac{1}{N} \sum_{i=1}^N \bigl(f^{(i)} - \Psi_\theta(x^{(i)})\bigr)^2
-=: \operatorname*{arg\, min}_{\Psi_\theta\in\mathcal{M}_{d,\varphi}} \mathcal{L}_N(\Psi_\theta)
-```
-
-> **Definition** (loss function):
-> A _loss functions_ is any function, which measures how good a neural network approximates the target values.
-
-**TODO: Is there a maximum number of inline math?**
-
-Typical loss functions for regression and classification tasks are
-  - mean-square error (MSE, standard $`L^2`$-error)
-  - weighted $`L^p`$- or $`H^k`$-norms (solutions of PDEs)
-  - cross-entropy (difference between distributions)
-  - Kullback-Leibler divergence, Hellinger distance, Wasserstein metrics
-  - Hinge loss (SVM)
-
-To find a minimizer of our loss function $`\mathcal{L}_N`$, we want to use the first-order optimality criterion
-
-```math
-0
-= \operatorname{\nabla}_\vartheta \mathcal{L}_N(\Psi_\vartheta)
-= -\frac{2}{N} \sum_{i=1}^N \bigl(f^{(i)} - \Psi_\vartheta(x^{(i)}\bigr) \operatorname{\nabla}_\vartheta \Psi_\vartheta.
-```
-
-Solving this equation requires the evaluation of the Jacobian (gradient) of the neural network $`\Psi_\vartheta`$ with respect to the network parameters $`\vartheta`$.
-As $`\vartheta\in\mathbb{R}^M`$ with $`M\gg1`$ (millions of degrees of freedom), computation of the gradient w.r.t. all parameters for each training data point is infeasible.
- 
-Optimization (Training)
------------------------
-
-Instead of solving the minimization problem explicitly, we can use iterative schemes to approximate the solution.
-The easiest and most well known approach is gradient descent (Euler's method), i.e.
-
-```math
-\vartheta^{(j+1)} = \vartheta^{(j)} - \eta \operatorname{\nabla}_{\vartheta}\mathcal{L}_N(\Psi_{\vartheta^{(j)}}),
-\qquad j=0, 1, 2, \dots
-```
-
-where the step size $`\eta>0`$ is typically called the _learning rate_ and $`\vartheta^{(0)}`$ is a random initialization of the weights and biases.
-
-The key why gradient descent is more promising then first-order optimality criterion is the iterative character.
-In particular, we can use the law of large numbers and restrict the number of summands in $`\mathcal{L}_N`$ to a random subset of fixed size in each iteration step, which is called _stochastic gradient descent_ (SGD).
-Convergence of SGD can be shown by convex minimization and stochastic approximation theory and only requires that the learning rate $`\eta`$ with an appropriate rate.
-**(see ?? for mor information)**
-
-Here, however, I want to focus more on the difference between "normal" GD and SGD (in an intuitive level).
-In principle, SGD trades gradient computations of a large number of term against the convergence rate of the algorithm.
-The best metaphor to remember the difference (I know of) is the following:
-
-> **Metaphor (SGD):**
-> Assume you and a friend of yours have had a party on the top of a mountain.
-> As the party has come to an end, you both want to get back home somewhere in the valley.
-> You, scientist that you are, plan the most direct way down the mountain, following the steepest descent, planning each step carefully as the terrain is very rough.
-> Your friend, however, drank a little to much and is not capable of planning anymore.
-> So they stagger down the mountain in a more or less random direction.
-> Each step they take is with little thought, but it takes them a long time overall to get back home (or at least close to it).
->
-> <img src="sgd.png" title="sgd" alt="sgd" height=400 />
-
-What remains is the computation of $`\operatorname{\nabla}_\vartheta\Psi_{\vartheta^{(i)}}`$ for $`i\in\Gamma_j\subset\{1,\dots,N\}`$ in each step.
-Lucky for us, we know that $`\Psi_\vartheta`$ is a simple concatenation of activation functions $`\varphi_\ell`$ and affine maps $`A_\ell(x^{(\ell-1)}) = W_\ell x^{(\ell-1)} + b_\ell`$ with derivative
-
-```math
-\partial_{W^{(m)}_{\alpha,\beta}} A^{(\ell)} = 
-\begin{cases}
-W^{(\ell)}_{\alpha,\beta} & \text{if }m=\ell,\\
-0 & \text{if }m\neq\ell,
-\end{cases}
-\qquad\text{and}\qquad
-\partial_{b^{(m)}_{\alpha}} A^{(\ell)} = 
-\begin{cases}
-b^{(\ell)}_{\alpha} & \text{if }m=\ell,\\
-0 & \text{if }m\neq\ell.
-\end{cases}
-```
-
-The gradient $`\operatorname{\nabla}_\vartheta\Psi_{\vartheta^{(i)}}`$ can then be computed using the chain rule due to the compositional structure of the neural network.
-Computing the gradient through the chain rule is still very inefficient and most probably infeasible if done in a naive fashion.
-The so called _Backpropagation_ is esentially a way to compute the partial derivatives layer-wise storting only the necessary information to prevent repetitive computations, rendering the computation manaeable. 
-
-Types of Neural Networks
-------------------------
-
-| Name | Graph |
-| --- | --- |
-| Fully Connected Neural Network | <img src="nn_fc.png" title="nn_fc" alt="nn_fc" height=250 /> |
-| Convolutional Neural Network | <img src="nn_conv.png" title="nn_conv" alt="nn_conv" height=250/> |
-| U-Net | <img src="u_net.png" title="u_net" alt="u_net" height=250/> |
-| Residual Neural Network | <img src="res_net.png" title="res_net" alt="res_net" height=250/> |
-| Invertible Neural Network | <img src="inn.png" title="inn" alt="inn" height=250/> |
-
-Further Reading
----------------
-
-- Python: PyTorch, TensorFlow, Scikit learn
-- Matlab: Deeplearning Toolbox
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)
+1. - weighted $`H^k`$-norms (solutions of PDEs)