Contents

cv_demo.m

From A First Course in Machine Learning, Chapter 1. Simon Rogers, 31/10/11 [simon.rogers@glasgow.ac.uk] Demonstration of cross-validation for model selection

clear all;close all;

Generate some data

Generate x between -5 and 5

N = 100;
x = 10*rand(N,1) - 5;
t = 5*x.^3  - x.^2 + x + 150*randn(size(x));
testx = [-5:0.01:5]'; % Large, independent test set
testt = 5*testx.^3 - testx.^2 + testx + 150*randn(size(testx));

Run a cross-validation over model orders

maxorder = 7;
X = [];
testX = [];
K = 10 %K-fold CV
sizes = repmat(floor(N/K),1,K);
sizes(end) = sizes(end) + N - sum(sizes);
csizes = [0 cumsum(sizes)];

% Note that it is often sensible to permute the data objects before
% performing CV.  It is not necessary here as x was created randomly.  If
% it were necessary, the following code would work:
% order = randperm(N);
% x = x(order); Or: X = X(order,:) if it is multi-dimensional.
% t = t(order);

for k = 0:maxorder
    X = [X x.^k];
    testX = [testX testx.^k];
    for fold = 1:K
        % Partition the data
        % foldX contains the data for just one fold
        % trainX contains all other data

        foldX = X(csizes(fold)+1:csizes(fold+1),:);
        trainX = X;
        trainX(csizes(fold)+1:csizes(fold+1),:) = [];
        foldt = t(csizes(fold)+1:csizes(fold+1));
        traint = t;
        traint(csizes(fold)+1:csizes(fold+1)) = [];

        w = inv(trainX'*trainX)*trainX'*traint;
        fold_pred = foldX*w;
        cv_loss(fold,k+1) = mean((fold_pred-foldt).^2);
        ind_pred = testX*w;
        ind_loss(fold,k+1) = mean((ind_pred - testt).^2);
        train_pred = trainX*w;
        train_loss(fold,k+1) = mean((train_pred - traint).^2);
    end
end
K =

    10

Plot the results

figure(1);
subplot(131)
plot(0:maxorder,mean(cv_loss,1),'linewidth',2)
xlabel('Model Order');
ylabel('Loss');
title('CV Loss');
subplot(132)
plot(0:maxorder,mean(train_loss,1),'linewidth',2)
xlabel('Model Order');
ylabel('Loss');
title('Train Loss');
subplot(133)
plot(0:maxorder,mean(ind_loss,1),'linewidth',2)
xlabel('Model Order');
ylabel('Loss');
title('Independent Test Loss')