Skip to content
Snippets Groups Projects
basics.md 11.42 KiB

Neural Networks 101

ml

Table of Contents

Nomenclature and Definitions

First, we need to clarify a few terms: artificial intelligence, machine learning and neural network. Everybody categorizes them differently, but we look at it this way:


venn

Here we focus on neural networks as a special model class used for function approximation in regression or classification tasks. To be more precise, we will rely on the following definition.

Definition (Neural Network): For any L\in\mathbb{N} and d=(d_0,\dots,d_L)\in\mathbb{N}^{L+1} a non-linear map \Psi\colon\mathbb{R}^{d_0}\to\mathbb{R}^{d_L} of the form

\Psi(x) = \bigl[\varphi_L\circ (W_L\bullet  + b_L)\circ\varphi_{L-1}\circ\dots\circ(W_2\bullet  + b_2)\circ\varphi_1\circ (W_1\bullet  + b_1)\bigr](x)

is called a fully connected feed-forward neural network.

Typically, we use the following nomenclature:

  • L is called the depth of the network with layers \ell=0,\dots,L.
  • d is called the width of the network, where d_\ell is the widths of the layers \ell.
  • W_\ell\in\mathbb{R}^{d_{\ell-1}\times d_\ell} are the weights of layer \ell.
  • b_\ell\in\mathbb{R}^{d_\ell} is the biases of layer \ell.
  • \vartheta=(W_1,b_1,\dots,W_L,b_L) are the free parameters of the neural network. Sometimes we write \Psi_\vartheta or \Psi(x; \vartheta) to indicate the dependence of \Psi on the parameters \vartheta.
  • \varphi_\ell is the activation function of layer \ell. Note that \varphi_\ell has to be non-linear and monotone increasing.

Additionally, there exist the following conventions:

  • x^{(0)}:=x is called the input (layer) of the neural network \Psi.
  • x^{(L)}:=\Psi(x) is called the output (layer) of the neural network \Psi.
  • Intermediate results x^{(\ell)} = \varphi_\ell(W_\ell\, x^{(\ell-1)} + b_\ell) are called hidden layers.
  • (debatable) A neural network is called shallow if it has only one hidden layer (L=2) and deep otherwise.

Example: Let L=3, d=(6, 10, 10, 3) and \varphi_1=\varphi_2=\varphi_3=\tanh. Then the neural network is given by the concatenation

\Psi\colon \mathbb{R}^6\to\mathbb{R}^3,
\qquad
\Psi(x) = \varphi_3\Bigl(W_3 \Bigl(\underbrace{\varphi_2\bigl(W_2 \bigl(\underbrace{\varphi_1(W_1 x + b_1)}_{x^{(1)}}\bigr) + b_2\bigr)}_{x^{(2)}}\Bigr) + b_3\Bigr).

A typical graphical representation of the neural network looks like this:


ml