Neural Computing 4
Lab 6 – Transformations of the input space
This lab will be ASSESSED, and will form 20% of assessed work. Points which must be included in the hand-in are numbered and highlighted in italics. Make sure you also have the accompanying extract on RBF-nets from Bishop's book. This work should be handed in by the end of week 20.
This lab will examine models which perform a transformation of the input space then have a linear weighting of the transformed data. We will use this on regression problems. (Where we are given (x,y) pairs, where y is defined on a continuous space, rather than the discrete one we have used in classification training sets.
As described in your lecture notes, when an optimisation problem is linear in the parameters we can use the pseudoinverse to find the optimal (in a mean squared error sense) parameters
, where
, where X is the matrix of training inputs, and Y is the vector of training outputs (we are only considering the single output case here, but the method does carry over to multi-output cases where Y is also a matrix). See how this is implement this in MATLAB using the pinv() command to calculate the pseudoinverse.
Download the file trans_demo.m and transform.m from the Lab web-page and experiment with the possible options of transformations. For the quadratic transformation case we can actually fit the exact model which generated the training data. Try adding noise to the training outputs before identification and see how the parameters W change from the optimal values. Note that the inputs have been expanded to a higher dimension in many of these – this means that even though we just have a linear transformation after that, we can still produce quite powerful models using this.
1. Write a version of trans_demo.m & transform.m which includes basis functions which provide extra inputs in the form of polar coordinates (r, ¸ ) (assuming an original 2-dimensional input of x and y as Cartesian coordinates, where x = r cos ¸ & y = r sin ¸). Include your code in the report. Give an example of an application where this new basis space would be useful for modelling or classification purposes (i.e. how does it make the learning task easier?).
Once you feel you understand what is happening, move on to the radial-basis function example. Some models of neurons are localised, that is the respond most around a centre point, and their response decreases as you move away from this centre (a point in the input space). A function used to describe this behaviour is the standard Gaussian function. y = exp(-d2/w2), where d = x-c, for an input x, and centre c. w describes the width of the basis function (the variance in statistical language). In multi-input cases c and w are vectors (although a simplification is to have a scalar w which is then a spherical Gaussian, as opposed to an ellipsoidal one). More details are given in the hand-out (parts of Ch.5 C. Bishop, Neural networks for Pattern Recognition).
Load the example files rbf_demo.m and rbftransform.m from the web-page. Try varying the density of the grid of basis function centres.
2. i) What happens to the accuracy of the model fit to the data, and to the time taken to calculate the weights as the number of radial basis functions increases?
ii) What happens outside the area of input space populated by training data? Why do you think this happens?
Create a training set using the file getdata.m we used last week for creating a classification example, and use this to create a training set for the RBF-network. Plot the results. We have a grid of basis functions here, but that is not particularly efficient if the data is not spaced on a grid, or if the function is very complicated in some parts of the space but relatively simple in others. One approach used is to use a k-means clustering algorithm to allocate centres around the area covered by the training data. (See page 188 of the extra handout)
3. Write a routine which does the following, and include the code in your report:
function [C,J] = kmeansiteration(X,Cinit)
% implements k-means clustering on a matrix X (dimensions N x d),
% starting from some clusters Cinit k x d,
% and which given the number of clusters desired, k,
% returns C as a k x d matrix of k points in the input space
% J is the k-means cost function.
this can then be used to cluster the centres of the basis functions to more efficiently cover the data. A useful approach to use when testing your k-means algorithm, is to plot the results, so you might want to do something like this;
X = [randn(500,2); [5+randn(300,1) 2+randn(300,1)]];
k = 15;
C = X(1:k,:); % start Centres off as first f points.
Jold = Inf;
while 1;
[C,J] = kmeansiteration(X,C);
if J == Jold
break
else
Jold = J;
end
plot(X(:,1),X(:,2),'.',C(:,1),C(:,2),'o');
end
once you have implemented the k-means clustering algorithm you can use it to cluster RBF centres for the test examples. Describe situations where this learning algorithm will not find the best model for a given training set.
Now, note that we might also want to adapt the widths of the basis functions as well, so that sparsely populated areas of the input space could be covered by a small number of large basis functions, while others would have a higher density of small basis functions. One simple rule is to make the width proportional to the distance of k-nearest neighbours to a given RBF centre.
4. Using the k-means clustering of centres, and nearest-neighbour width fitting, fit an RBF-network to the following data set. Plot the network response, and the basis functions, using the same sort of code as in the rbf_demo.m file.
Direct any queries or difficulties to me rod@dcs.gla.ac.uk - I’ll be pleased to help. Note also that frequently asked questions will appear on the NC4 labs web-page, so check there first.
http://www.dcs.gla.ac.uk/~rod/NC4/index.htm
Roderick Murray-Smith