Skip to content
Snippets Groups Projects
Commit 9b2fa410 authored by Nando Farchmin's avatar Nando Farchmin
Browse files

Test markdown math display

parent b6db7c05
No related branches found
No related tags found
1 merge request!1Update math to conform with gitlab markdown
The empirical regression problem then reads 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
```math 1. - weighted $`H^k`$-norms (solutions of PDEs)
\text{Find}\qquad \Psi_\vartheta 1. - weighted $`H^k`$-norms (solutions of PDEs)
= \operatorname*{arg\, min}_{\Psi_\theta\in\mathcal{M}_{d,\varphi}} \frac{1}{N} \sum_{i=1}^N \bigl(f^{(i)} - \Psi_\theta(x^{(i)})\bigr)^2 1. - weighted $`H^k`$-norms (solutions of PDEs)
=: \operatorname*{arg\, min}_{\Psi_\theta\in\mathcal{M}_{d,\varphi}} \mathcal{L}_N(\Psi_\theta) 1. - weighted $`H^k`$-norms (solutions of PDEs)
``` 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
> **Definition** (loss function): 1. - weighted $`H^k`$-norms (solutions of PDEs)
> A _loss functions_ is any function, which measures how good a neural network approximates the target values. 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
**TODO: Is there a maximum number of inline math?** 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
Typical loss functions for regression and classification tasks are 1. - weighted $`H^k`$-norms (solutions of PDEs)
- mean-square error (MSE, standard $`L^2`$-error) 1. - weighted $`H^k`$-norms (solutions of PDEs)
- weighted $`L^p`$- or $`H^k`$-norms (solutions of PDEs) 1. - weighted $`H^k`$-norms (solutions of PDEs)
- cross-entropy (difference between distributions) 1. - weighted $`H^k`$-norms (solutions of PDEs)
- Kullback-Leibler divergence, Hellinger distance, Wasserstein metrics 1. - weighted $`H^k`$-norms (solutions of PDEs)
- Hinge loss (SVM) 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
To find a minimizer of our loss function $`\mathcal{L}_N`$, we want to use the first-order optimality criterion 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
```math 1. - weighted $`H^k`$-norms (solutions of PDEs)
0 1. - weighted $`H^k`$-norms (solutions of PDEs)
= \operatorname{\nabla}_\vartheta \mathcal{L}_N(\Psi_\vartheta) 1. - weighted $`H^k`$-norms (solutions of PDEs)
= -\frac{2}{N} \sum_{i=1}^N \bigl(f^{(i)} - \Psi_\vartheta(x^{(i)}\bigr) \operatorname{\nabla}_\vartheta \Psi_\vartheta. 1. - weighted $`H^k`$-norms (solutions of PDEs)
``` 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
Solving this equation requires the evaluation of the Jacobian (gradient) of the neural network $`\Psi_\vartheta`$ with respect to the network parameters $`\vartheta`$. 1. - weighted $`H^k`$-norms (solutions of PDEs)
As $`\vartheta\in\mathbb{R}^M`$ with $`M\gg1`$ (millions of degrees of freedom), computation of the gradient w.r.t. all parameters for each training data point is infeasible. 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
Optimization (Training) 1. - weighted $`H^k`$-norms (solutions of PDEs)
----------------------- 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
Instead of solving the minimization problem explicitly, we can use iterative schemes to approximate the solution. 1. - weighted $`H^k`$-norms (solutions of PDEs)
The easiest and most well known approach is gradient descent (Euler's method), i.e. 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
```math 1. - weighted $`H^k`$-norms (solutions of PDEs)
\vartheta^{(j+1)} = \vartheta^{(j)} - \eta \operatorname{\nabla}_{\vartheta}\mathcal{L}_N(\Psi_{\vartheta^{(j)}}), 1. - weighted $`H^k`$-norms (solutions of PDEs)
\qquad j=0, 1, 2, \dots 1. - weighted $`H^k`$-norms (solutions of PDEs)
``` 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
where the step size $`\eta>0`$ is typically called the _learning rate_ and $`\vartheta^{(0)}`$ is a random initialization of the weights and biases. 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
The key why gradient descent is more promising then first-order optimality criterion is the iterative character. 1. - weighted $`H^k`$-norms (solutions of PDEs)
In particular, we can use the law of large numbers and restrict the number of summands in $`\mathcal{L}_N`$ to a random subset of fixed size in each iteration step, which is called _stochastic gradient descent_ (SGD). 1. - weighted $`H^k`$-norms (solutions of PDEs)
Convergence of SGD can be shown by convex minimization and stochastic approximation theory and only requires that the learning rate $`\eta`$ with an appropriate rate. 1. - weighted $`H^k`$-norms (solutions of PDEs)
**(see ?? for mor information)** 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
Here, however, I want to focus more on the difference between "normal" GD and SGD (in an intuitive level). 1. - weighted $`H^k`$-norms (solutions of PDEs)
In principle, SGD trades gradient computations of a large number of term against the convergence rate of the algorithm. 1. - weighted $`H^k`$-norms (solutions of PDEs)
The best metaphor to remember the difference (I know of) is the following: 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
> **Metaphor (SGD):** 1. - weighted $`H^k`$-norms (solutions of PDEs)
> Assume you and a friend of yours have had a party on the top of a mountain. 1. - weighted $`H^k`$-norms (solutions of PDEs)
> As the party has come to an end, you both want to get back home somewhere in the valley. 1. - weighted $`H^k`$-norms (solutions of PDEs)
> You, scientist that you are, plan the most direct way down the mountain, following the steepest descent, planning each step carefully as the terrain is very rough. 1. - weighted $`H^k`$-norms (solutions of PDEs)
> Your friend, however, drank a little to much and is not capable of planning anymore. 1. - weighted $`H^k`$-norms (solutions of PDEs)
> So they stagger down the mountain in a more or less random direction. 1. - weighted $`H^k`$-norms (solutions of PDEs)
> Each step they take is with little thought, but it takes them a long time overall to get back home (or at least close to it). 1. - weighted $`H^k`$-norms (solutions of PDEs)
> 1. - weighted $`H^k`$-norms (solutions of PDEs)
> <img src="sgd.png" title="sgd" alt="sgd" height=400 /> 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
What remains is the computation of $`\operatorname{\nabla}_\vartheta\Psi_{\vartheta^{(i)}}`$ for $`i\in\Gamma_j\subset\{1,\dots,N\}`$ in each step. 1. - weighted $`H^k`$-norms (solutions of PDEs)
Lucky for us, we know that $`\Psi_\vartheta`$ is a simple concatenation of activation functions $`\varphi_\ell`$ and affine maps $`A_\ell(x^{(\ell-1)}) = W_\ell x^{(\ell-1)} + b_\ell`$ with derivative 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
```math 1. - weighted $`H^k`$-norms (solutions of PDEs)
\partial_{W^{(m)}_{\alpha,\beta}} A^{(\ell)} = 1. - weighted $`H^k`$-norms (solutions of PDEs)
\begin{cases} 1. - weighted $`H^k`$-norms (solutions of PDEs)
W^{(\ell)}_{\alpha,\beta} & \text{if }m=\ell,\\ 1. - weighted $`H^k`$-norms (solutions of PDEs)
0 & \text{if }m\neq\ell, 1. - weighted $`H^k`$-norms (solutions of PDEs)
\end{cases} 1. - weighted $`H^k`$-norms (solutions of PDEs)
\qquad\text{and}\qquad 1. - weighted $`H^k`$-norms (solutions of PDEs)
\partial_{b^{(m)}_{\alpha}} A^{(\ell)} = 1. - weighted $`H^k`$-norms (solutions of PDEs)
\begin{cases} 1. - weighted $`H^k`$-norms (solutions of PDEs)
b^{(\ell)}_{\alpha} & \text{if }m=\ell,\\ 1. - weighted $`H^k`$-norms (solutions of PDEs)
0 & \text{if }m\neq\ell. 1. - weighted $`H^k`$-norms (solutions of PDEs)
\end{cases} 1. - weighted $`H^k`$-norms (solutions of PDEs)
``` 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
The gradient $`\operatorname{\nabla}_\vartheta\Psi_{\vartheta^{(i)}}`$ can then be computed using the chain rule due to the compositional structure of the neural network. 1. - weighted $`H^k`$-norms (solutions of PDEs)
Computing the gradient through the chain rule is still very inefficient and most probably infeasible if done in a naive fashion. 1. - weighted $`H^k`$-norms (solutions of PDEs)
The so called _Backpropagation_ is esentially a way to compute the partial derivatives layer-wise storting only the necessary information to prevent repetitive computations, rendering the computation manaeable. 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
Types of Neural Networks 1. - weighted $`H^k`$-norms (solutions of PDEs)
------------------------ 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
| Name | Graph | 1. - weighted $`H^k`$-norms (solutions of PDEs)
| --- | --- | 1. - weighted $`H^k`$-norms (solutions of PDEs)
| Fully Connected Neural Network | <img src="nn_fc.png" title="nn_fc" alt="nn_fc" height=250 /> | 1. - weighted $`H^k`$-norms (solutions of PDEs)
| Convolutional Neural Network | <img src="nn_conv.png" title="nn_conv" alt="nn_conv" height=250/> | 1. - weighted $`H^k`$-norms (solutions of PDEs)
| U-Net | <img src="u_net.png" title="u_net" alt="u_net" height=250/> | 1. - weighted $`H^k`$-norms (solutions of PDEs)
| Residual Neural Network | <img src="res_net.png" title="res_net" alt="res_net" height=250/> | 1. - weighted $`H^k`$-norms (solutions of PDEs)
| Invertible Neural Network | <img src="inn.png" title="inn" alt="inn" height=250/> | 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
Further Reading 1. - weighted $`H^k`$-norms (solutions of PDEs)
--------------- 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
- Python: PyTorch, TensorFlow, Scikit learn 1. - weighted $`H^k`$-norms (solutions of PDEs)
- Matlab: Deeplearning Toolbox 1. - weighted $`H^k`$-norms (solutions of PDEs)
1. - weighted $`H^k`$-norms (solutions of PDEs)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment