I started to become interested modelling uncertainty in machine learning. After stumbling upon an episode about Gaussian Processes on the Talking Machines podcast, I wanted to understand this process better. Next, I read the well explained blog post by Katherine Bailey and a few more interesting links:

What is a Gaussian process

A Gaussian process is a probability distribution over possible functions.

Short code example

The code is adapted from https://katbailey.github.io/post/gaussian-processes-for-dummies/

Let's start by defining points at which we evaluate our function: 50 evenly spaced points from the interval [-5,5].

We define a kernel function that uses the squared exponential, aka Gaussian kernel.

This Gaussian kernel calculates the distance between points and converts it into a measure of similarity, controlled by a tuning parameter.

import numpy as np
import matplotlib.pyplot as pl

# Test data
n = 50
Xtest = np.linspace(-5, 5, n).reshape(-1,1)

# Define the kernel function
def kernel(a, b, param):
    sqdist = np.sum(a**2,1).reshape(-1,1) + np.sum(b**2,1) - 2*np.dot(a, b.T)
    return np.exp(-.5 * (1/param) * sqdist)

param = 0.1
K_ss = kernel(Xtest, Xtest, param)

# Get cholesky decomposition (square root) of the
# covariance matrix
L = np.linalg.cholesky(K_ss + 1e-15*np.eye(n))
# Sample 3 sets of standard normals for our test points,
# multiply them by the square root of the covariance matrix
f_prior = np.dot(L, np.random.normal(size=(n,3)))

# Now let's plot the 3 sampled functions.
pl.plot(Xtest, f_prior)
pl.axis([-5, 5, -3, 3])
pl.title('Three samples from the GP prior')
pl.show()

Now we’ll observe some data. The actual function generating the y values from our x values, is the sin function. We generate the output at our 5 training points, do some complex mathematics, sample from the posterior and plot it.

# Noiseless training data
Xtrain = np.array([-4, -3, -2, -1, 1]).reshape(5,1)
ytrain = np.sin(Xtrain)

# Apply the kernel function to our training points
K = kernel(Xtrain, Xtrain, param)
L = np.linalg.cholesky(K + 0.00005*np.eye(len(Xtrain)))

# Compute the mean at our test points.
K_s = kernel(Xtrain, Xtest, param)
Lk = np.linalg.solve(L, K_s)
mu = np.dot(Lk.T, np.linalg.solve(L, ytrain)).reshape((n,))

# Compute the standard deviation so we can plot it
s2 = np.diag(K_ss) - np.sum(Lk**2, axis=0)
stdv = np.sqrt(s2)
# Draw samples from the posterior at our test points.
L = np.linalg.cholesky(K_ss + 1e-6*np.eye(n) - np.dot(Lk.T, Lk))
f_post = mu.reshape(-1,1) + np.dot(L, np.random.normal(size=(n,3)))

pl.plot(Xtrain, ytrain, 'bs', ms=8)
pl.plot(Xtest, f_post)
pl.gca().fill_between(Xtest.flat, mu-2*stdv, mu+2*stdv, color="#dddddd")
pl.plot(Xtest, mu, 'r--', lw=2)
pl.axis([-5, 5, -3, 3])
pl.title('Three samples from the GP posterior')
pl.show()

Explanation:

The blue squares are the training points.

All possible functions we have sampled from the posterior go through these points.

The red line is the mean and the grey area 2 standard deviations from the mean.

to be extended...