Here we focus on neural networks as a special model class used for function approximation in regression or classification tasks.
To be more precise, we will rely on the following definition.
> **Definition** (Neural Network):
> For any $`L\in\mathbb{N}`$ and $`d=(d_0,\dots,d_L)\in\mathbb{N}^{L+1}`$ a non-linear map $`\Psi\colon\mathbb{R}^{d_0}\to\mathbb{R}^{d_L}`$ of the form
If we want to use the neural network to approximate a function $`f`$ the easiest approach would be to conduct a Least-Squares regression in an appropriate norm.
To make things even easier for the explaination, we assume $`f\colon \mathbb{R}^K \to \mathbb{R}`$, i.e., $`\operatorname{dim}(x^{(0)})=K`$ and $`\operatorname{dim}(x^{(L)})=1`$.
Assuming the function $`f`$ has a second moment, we can use a standard $`L^2`$-norm for our Least-Square problem:
```math
\text{Find}\qquad \Psi_\vartheta
= \operatorname*{arg\, min}_{\Psi_\theta\in\mathcal{M}_{d,\varphi}} \Vert f - \Psi_\theta \Vert_{L^2(\pi)}^2