Data Structures & Algorithms 2

Neural Computing 4

Lab 8 – Applications
This lab will be ASSESSED., and will form 30% of the marks for practical work. The work should be handed in by the end of Week 22, at the latest, although we recommend submission earlier, in your own interest.

This assignment brings together the earlier labs and applies the tools developed to some practical problems. You will be expected to apply the tools to the problem, present the results and provide reasoned assessment of the reliability of your predicted success rate. You may use your own software if you wish, but you might find that the following implementation of the MLP is more useful.

Classification problem

The post office is running a competition to see who produce the best classification on the following data set for character recognition. The details of this data are given overleaf. Use a multi-layer perceptron to classify the data. Data is in Dataset.data.

What do you notice about the progress of training and test set errors if you use your own implementation of the MLP (or this one here: classdemo.m,mlpforward.m, mlpbackward.m, confusionmulti.m)? Describe what happens and experiment with the training time, number of hidden units and size of training set.

You can change the loop so that you include some convergence criterion, rather than running until max_iters iterations. Rather than choosing the network that achieves best classification on the training set, we recommend separating some of the data to keep as a separate test-set. Discuss how the test behaviour changes during learning relative to the training error. You might also want to monitor the percentage of correctly classified cases - why might this not change proportional to the sum of squared errors? Which letters does the classifier tend to make errors on? Why do you think this is? You can either use a single neural network with 26 outputs on this problem, or 26 networks with a single output. What are the pros and cons of these options? (You don't have to implement both methods)

Note that after some initial experimentation it might be better to write some m-files to run a number of tests, as they are likely to be time consuming.

Write a brief (no more than 2 pages + figures) showing how your experiments with the data worked, and tabulate results of different numbers of hidden units, different convergence criteria/number of trianing iterations, and the expected classification performance of the network in practice on training and test sets. What factors would affect the validity of your predictions? Why is the class distribution interesting (see end of data description).

The description of the data is as follows:

1. Title: Letter Image Recognition Data

2. Source Information

-- Creator: David J. Slate

-- Odesta Corporation; 1890 Maple Ave; Suite 115; Evanston, IL 60201

-- Donor: David J. Slate (dave@math.nwu.edu) (708) 491-3867

-- Date: January, 1991

3. Past Usage:

-- P. W. Frey and D. J. Slate (Machine Learning Vol 6 #2 March 91):

"Letter Recognition Using Holland-style Adaptive Classifiers".

The research for this article investigated the ability of several

variations of Holland-style adaptive classifier systems to learn to

correctly guess the letter categories associated with vectors of 16

integer attributes extracted from raster scan images of the letters.

The best accuracy obtained was a little over 80%. It would be

interesting to see how well other methods do with the same data.

4. Relevant Information:

The objective is to identify each of a large number of black-and-white

rectangular pixel displays as one of the 26 capital letters in the English

alphabet. The character images were based on 20 different fonts and each

letter within these 20 fonts was randomly distorted to produce a file of

20,000 unique stimuli. Each stimulus was converted into 16 primitive

numerical attributes (statistical moments and edge counts) which were then

scaled to fit into a range of integer values from 0 through 15. We

typically train on the first 16000 items and then use the resulting model

to predict the letter category for the remaining 4000. See the article

cited above for more details.

5. Number of Instances: 20000

6. Number of Attributes: 17 (Letter category and 16 numeric features)

7. Attribute Information:

1. lettr capital letter (26 values from A to Z)

2. x-box horizontal position of box (integer)

3. y-box vertical position of box (integer)

4. width width of box (integer)

5. high height of box (integer)

6. onpix total # on pixels (integer)

7. x-bar mean x of on pixels in box (integer)

8. y-bar mean y of on pixels in box (integer)

9. x2bar mean x variance (integer)

10. y2bar mean y variance (integer)

11. xybar mean x y correlation (integer)

12. x2ybr mean of x * x * y (integer)

13. xy2br mean of x * y * y (integer)

14. x-ege mean edge count left to right (integer)

15. xegvy correlation of x-ege with y (integer)

16. y-ege mean edge count bottom to top (integer)

17. yegvx correlation of y-ege with x (integer)

8. Missing Attribute Values: None

9. Class Distribution:

789 A 766 B 736 C 805 D 768 E 775 F 773 G

734 H 755 I 747 J 739 K 761 L 792 M 783 N

753 O 803 P 783 Q 758 R 748 S 796 T 813 U

764 V 752 W 787 X 786 Y 734 Z

Direct any queries or difficulties to me rod@dcs.gla.ac.uk - I’ll be pleased to help. Note also that frequently asked questions will appear on the NC4 labs web-page, so check there first.

http://www.dcs.gla.ac.uk/~rod/NC4/index.htm

Roderick Murray-Smith