Université Paul Sabatier 2019-2020, M2 SID¶

Machine learning - Practical kernel ridge regression¶

Load necessary functions. Function generateData vous simulates an independant sample from the following process:

  • $X$ is uniform on $[-1,1]$
  • $Y = \sin(5X) + \epsilon$ où $\epsilon$ is Gaussian with $0$mean $1/10$ variance.

We will use a fixed training set which we plot.

In [1]:
%matplotlib inline
from matplotlib import pyplot
import math
import numpy as np
import scipy.misc
import numpy.random as npr
import sklearn as sk
from sklearn import neighbors
from sklearn import kernel_ridge

def generateData(n,seed=1):
    npr.seed(seed)
    x = npr.rand(n)*2 - 1
    y = np.sin(x*5) + npr.normal(size=n) / np.sqrt(10)
    return(np.array(x).reshape(-1,1),np.array(y).reshape(-1,1))

n=20
(xn,yn) = generateData(n,3)
xSeq = np.array(np.linspace(-1,1,1000)).reshape(-1,1)
pyplot.plot(xn,yn,'.')
pyplot.plot(xSeq,np.sin(xSeq * 5),'--')
Out[1]:
[<matplotlib.lines.Line2D at 0x7f08580a4710>]

Nonlinear regression¶

Training set is fixed. Implement the kernel linear regression and represent the result with laplacian kernel. Use kernel_ridge.KernelRidge with $\alpha = 0$ (why?). What is the parameter $\gamma$? Represent the result several values of $\gamma$. What do you observe? Why is this called kernel interpolation?

In [2]:
pyplot.plot(xn,yn,'.')
pyplot.plot(xSeq,np.sin(xSeq * 5),'--', label = "Bayes")

gammas = (1,10,40)
In [ ]:
 

Implement kernel ridge regression.¶

Training set is fixed. Draw the kernel ridge regression estimator with the 'laplacian' kernel with bandwidth parameter $\gamma$.

Use kernel_ridge.KernelRidge from scikit-learn . Represent the result for $\gamma = 1,10,100$. Make some comments.

In [4]:
pyplot.plot(xn,yn,'.')
pyplot.plot(xSeq,np.sin(xSeq * 5),'--', label = "Bayes")

gammas = (1,10,100)
Out[4]:
Text(0.5,1,'Kernel ridge: varying gamma (bandwidth)')

Same question with $\gamma = 10$ and with the $\alpha$ parameter ($\lambda$ in the course) taking values $0.01,0.1,1,10,100$. Comments.

In [5]:
pyplot.plot(xn,yn,'.')
pyplot.plot(xSeq,np.sin(xSeq * 5),'--', label = "Bayes")

alphas = (0.01,0.1,1,10,100)
Out[5]:
Text(0.5,1,'Kernel ridge: varying alpha (regularization weight)')

Question 2: large sample behavior¶

Use the laplacian kernel with $\gamma = 1, 10, 100$. For each value of $\gamma$, for $n = 10, 100, 1000$, generate a training set of size $n$, tune the alpha parameter of sk.kernel_ridge.KernelRidge using 5 fold cross validation and plot the estimated regressor.

What do you observe?

In [6]:
 
In [ ]:
 

Question 3: different kernels.¶

Perform the same experiments with the other kernels.

In [8]:
 

Question 4: pure linear algebra.¶

Perform the same experiments by computing kernel matrices yourself and only using linear algebra for training and testing. Verify that you observe the same results as scikit-learn.

In [9]:
n=20
(xn,yn) = generateData(n,3)
pyplot.plot(xn,yn,'.')
pyplot.plot(xSeq,np.sin(xSeq * 5),'--', label = "Bayes")
Out[9]:
Text(0.5,1,'Kernel ridge: varying alpha (regularization weight)')
In [10]:
 
Out[10]:
Text(0.5,1,'Kernel ridge: varying gamma (bandwidth)')
In [ ]: