NC4, Lab 7

Neural Computing 4

Lab 7 – Unsupervised learning

This lab will be ASSESSED, and will form 10% of the marks for practical work. The work should be handed in by the end of Week 21, at the latest, although we recommend submission earlier, in your own interest.

Up to now you have been dealing with systems which learned a mapping from an input space to an output space. The applications looked at were regression and classification, but the basic idea was the same. A 'teacher' specified what outputs should be associated with inputs, and the learning algorithm and network representation had to come up with a 'reasonable' model of this. This is called supervised learning. This lab deals with situations in which we do not know the answer, but want the learning system to self-organise in such a way that something 'useful' happens - unsupervised learning. Note that much of the research in this area involves finding mathematical models of what 'reasonable' models are, and how to define 'useful' for different problems.

We saw in Lab 6 that self-organisation can be used as part of supervised learning. The weights to the outputs were explicitly optimising the fit to the input-output pairs, while the position of the basis functions was dependent only on the inputs. There are many different views about what counts as self-organisation/unsupervised learning. One approach is to view it as implementing probability density functions which can be used for a number of purposes. In this lab we will look at Self-Organising Maps (SOM), an approach developed by T. Kohonen (the following is taken from his group's web-site). The SOM is an algorithm used to visualize and interpret large high-dimensional data sets. Typical applications are visualization of process states or financial results by representing the central dependencies within the data on the map.

The map consists of a regular grid of processing units, "neurons". A model of some multidimensional observation, eventually a vector consisting of features, is associated with each unit. The map attempts to represent all the available observations with optimal accuracy using a restricted set of models. At the same time the models become ordered on the grid so that similar models are close to each other and dissimilar models far from each other.

Fitting of the model vectors is usually carried out by a sequential regression process, where t = 1,2,... is the step index: For each sample x(t) in the training data, first the 'winner' index c (best match) is identified by the condition .

After that, all model vectors or a subset of them that belong to nodes centered around node c = c(x) are updated as . Here h_c,i is the 'neighbourhood function', a decreasing function of the distance between the ith and cth nodes on the map grid. See Kohonen's review paper for examples of such functions. This regression is usually reiterated over the available samples. Read the review article for more detailed information. An implementation of this simple form of the algorithm is provided in file SOMdemo.m

Background info on Self-Organising Maps can be found here:

http://www.cis.hut.fi/research/som-research/

This includes an on-line demo of SOM of Web-based information:

http://websom.hut.fi/websom/stt/doc/eng/

In the test data sets test1d, test2d, and test3d.mat the matrix inputs is being read in.

Run the algorithm on the following data set: test2d.mat Repeat this 3 times, starting from new random weights each time (but on the same data). Plot the final result of each run and include it in the report. You can use the getinput.m command to create your own test data sets. What do you notice about the final solutions reached by the self-organising map?

What behaviour would you expect to see if one input dimension was 10 times larger than the others? How could you get round this problem?

Adapt the program SOMdemo.m to find 1D & 3D self-organising maps, include your code in the report, and run the algorithm on the following data-sets: test1d.mat & test1d1.mat for the 1D case and test3d.mat and test1d3d.mat for the 3D case. (You will have to use this gplot3.m command which is not a standard part of matlab, or the mesh or surf commands). How will the current implementation of SOMdemo. Scale with increasingly large map sizes? How would you change the implementation? (You do not need to implement this)

We will look at the use of higher dimensional data sets for SOMs next time.

Direct any queries or difficulties to me rod@dcs.gla.ac.uk - I’ll be pleased to help. Note also that frequently asked questions will appear on the NC4 labs web-page, so check there first.

http://www.dcs.gla.ac.uk/~rod/NC4/index.htm

Roderick Murray-Smith