A Convolutional Neural Network, or CNN is a type of neural network that applies a series of *convolutions* onto an input image to produce an output image. However, contrary to more classic image filtering techniques the coefficients of the *filters* (or *kernels*) applied onto the image can be tuned using *gradient descent* or any other optimisation algorithm.

# Explanation

To perform a convolution, a CNN hovers a number of *filters* (below in yellow) over the entire input image (below in green) and multiply each pixel value by each value in the kernel to produce a (usually) smaller image (below in pink) as shown below:

The filters are moved with a given *stride* in each direction, here 1×1 which is classic, but to accelerate the convolution it can move in stride of 2×2 etc. This operation is repeated with different filters for each convolution layer in the model.

The number of trainable parameter N can be computed using the following formulae:

N = K_1 \times K_2 \times F \times C + FFor a convolutional layer with F filters of kernel shape K_1 \times K_2 applied to an input of shape (h, w, C).

Hence, if the RGB image in the beginning was of shape `(200, 200, 3)`

, after a 10 filters convolution with kernel size 5×5 and stride 1, the shape will be * (196, 196, 10) *and that layer will have 5\times5\times10\times3 + 10 = 760 trainable parameters (the filter have shape

`(5, 5, 3)`

) as demonstrated in Keras:from tensorflow.python import keras from keras.models import Sequential from keras.layers import Conv2D Sequential([ Conv2D(input_shape=(200, 200, 3), filters=10, kernel_size=5, strides=1) ]).summary()

_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_2 (Conv2D) (None, 196, 196, 10) 760 ================================================================= Total params: 760 Trainable params: 760 Non-trainable params: 0 _________________________________________________________________

# Implementation

Using François Chollet’s *Keras* framework, a convolutional layer can be used in a model using the `Conv2D`

class.

Its constructor with the most common arguments is as follow:

keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', activation=None)

Here:

`filters`

is the number of filter to apply on with this layer (i.e. F)`kernel_size`

is either a tuple of integers or an integer specifying the size of the kernel to hover, this**has to be odd numbers**since the kernel’s centre will determine the position of the output value in the output image.`strides`

is either a tuple of integers or an integer specifying the pace at which the kernel should hover`padding`

is a string with either`'valid'`

or`'same'`

, defaults to`'valid'`

.`'valid'`

padding in Keras (and TensorFlow) means no padding, i.e. the kernel will stop hovering near the borders to avoid falling over the edge and have missing values. This means that the border pixel wont have a convolution value and hence that the output is a tad smaller in height and width (in the above example, the padding was`'valid'`

and so the 200×200 image became 196×196).`'same'`

padding means padding the edge with zeros so that the output image has the*same*width and height as the input image (hence the name). However this is not the default in Keras as it means the border get corrupted with fake data.

`activation`

is either a string like`'tanh'`

,`'relu'`

,`'softmax'`

etc (list here) or an object of type`keras.activations`

. By default this will be the linear activation function f(x)=x. Each output value of the layer will be passed through this function before being passed to the next layer.