From b6db7c0571ef624a087f8423128bd428e3a16fb6 Mon Sep 17 00:00:00 2001
From: Nando Farchmin <nando.farchmin@gmail.com>
Date: Mon, 4 Jul 2022 11:35:50 +0200
Subject: [PATCH] Test markdown math display

---
 doc/basics.md | 6 +++---
 doc/tmp.md    | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/basics.md b/doc/basics.md
index f507a5f..4c9c977 100644
--- a/doc/basics.md
+++ b/doc/basics.md
@@ -162,7 +162,7 @@ The easiest and most well known approach is gradient descent (Euler's method), i
 where the step size $`\eta>0`$ is typically called the _learning rate_ and $`\vartheta^{(0)}`$ is a random initialization of the weights and biases.
 
 The key why gradient descent is more promising then first-order optimality criterion is the iterative character.
-In particular, we can use the law of large numbers and restrict the number of summands in $\mathcal{L}_N$ to a random subset of fixed size in each iteration step, which is called _stochastic gradient descent_ (SGD).
+In particular, we can use the law of large numbers and restrict the number of summands in $`\mathcal{L}_N`$ to a random subset of fixed size in each iteration step, which is called _stochastic gradient descent_ (SGD).
 Convergence of SGD can be shown by convex minimization and stochastic approximation theory and only requires that the learning rate $`\eta`$ with an appropriate rate.
 **(see ?? for mor information)**
 
@@ -180,10 +180,10 @@ The best metaphor to remember the difference (I know of) is the following:
 >
 > <img src="sgd.png" title="sgd" alt="sgd" height=400 />
 
-What remains is the computation of $`\operatorname{\nabla}_\vartheta\Psi_{\vartheta^{(i)}}`$ for $`i`\in\Gamma_j\subset\{1,\dots,N\}$ in each step.
+What remains is the computation of $`\operatorname{\nabla}_\vartheta\Psi_{\vartheta^{(i)}}`$ for $`i\in\Gamma_j\subset\{1,\dots,N\}`$ in each step.
 Lucky for us, we know that $`\Psi_\vartheta`$ is a simple concatenation of activation functions $`\varphi_\ell`$ and affine maps $`A_\ell(x^{(\ell-1)}) = W_\ell x^{(\ell-1)} + b_\ell`$ with derivative
 
-```
+```math
 \partial_{W^{(m)}_{\alpha,\beta}} A^{(\ell)} = 
 \begin{cases}
 W^{(\ell)}_{\alpha,\beta} & \text{if }m=\ell,\\
diff --git a/doc/tmp.md b/doc/tmp.md
index 74aac62..fd0183e 100644
--- a/doc/tmp.md
+++ b/doc/tmp.md
@@ -43,7 +43,7 @@ The easiest and most well known approach is gradient descent (Euler's method), i
 where the step size $`\eta>0`$ is typically called the _learning rate_ and $`\vartheta^{(0)}`$ is a random initialization of the weights and biases.
 
 The key why gradient descent is more promising then first-order optimality criterion is the iterative character.
-In particular, we can use the law of large numbers and restrict the number of summands in $\mathcal{L}_N$ to a random subset of fixed size in each iteration step, which is called _stochastic gradient descent_ (SGD).
+In particular, we can use the law of large numbers and restrict the number of summands in $`\mathcal{L}_N`$ to a random subset of fixed size in each iteration step, which is called _stochastic gradient descent_ (SGD).
 Convergence of SGD can be shown by convex minimization and stochastic approximation theory and only requires that the learning rate $`\eta`$ with an appropriate rate.
 **(see ?? for mor information)**
 
@@ -61,10 +61,10 @@ The best metaphor to remember the difference (I know of) is the following:
 >
 > <img src="sgd.png" title="sgd" alt="sgd" height=400 />
 
-What remains is the computation of $`\operatorname{\nabla}_\vartheta\Psi_{\vartheta^{(i)}}`$ for $`i`\in\Gamma_j\subset\{1,\dots,N\}$ in each step.
+What remains is the computation of $`\operatorname{\nabla}_\vartheta\Psi_{\vartheta^{(i)}}`$ for $`i\in\Gamma_j\subset\{1,\dots,N\}`$ in each step.
 Lucky for us, we know that $`\Psi_\vartheta`$ is a simple concatenation of activation functions $`\varphi_\ell`$ and affine maps $`A_\ell(x^{(\ell-1)}) = W_\ell x^{(\ell-1)} + b_\ell`$ with derivative
 
-```
+```math
 \partial_{W^{(m)}_{\alpha,\beta}} A^{(\ell)} = 
 \begin{cases}
 W^{(\ell)}_{\alpha,\beta} & \text{if }m=\ell,\\
-- 
GitLab