Contents

newspred.m

From A First Course in Machine Learning, Chapter 5. Simon Rogers, 01/11/11 [simon.rogers@glasgow.ac.uk] Naive Bayesian classifier on the 20 newsgroup data

clear all; close all;

Load the data

load ../data/newsgroups

Compute the class conditional q parameters

alpha = 2; % Smoothing parameter
M = size(X,2); % Vocabulary size
q = zeros(20,M);

for c = 1:20
    pos = find(t==c);
    q(c,:) = (alpha - 1 + sum(X(pos,:),1))./(M*(alpha-1) + sum(sum(X(pos,:))));
end

Compute the test probabilities

Do this with logs for numerical stability. Note: this takes quite a long time!

Nt = size(Xt,1);
testP = zeros(Nt,20);
for c = 1:20
    fprintf('\nClass %g',c);
    testP(:,c) = sum(Xt.*log(repmat(q(c,:),Nt,1)),2);
end
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
Class 7
Class 8
Class 9
Class 10
Class 11
Class 12
Class 13
Class 14
Class 15
Class 16
Class 17
Class 18
Class 19
Class 20

Normalise

C = 20;
prior = repmat(1/C,1,C); % Prior class probabilities
testP = testP + repmat(log(prior),Nt,1);
testP = exp(testP - repmat(max(testP,[],2),1,20));
testP = testP./repmat(sum(testP,2),1,20);

Visualise the probabilities

imagesc(testP);

Make the confusion matrix

Assign to max probability

assignments = (testP == repmat(max(testP,[],2),1,C));
[r,c] = find(assignments);
[r I] = sort(r);
c = c(I);
confusion = zeros(C);
for predicted = 1:C
    for true = 1:C
        confusion(predicted,true) = sum(testt==true & c==predicted);
    end
end

imagesc(confusion)
xlabel('True class');
ylabel('Predicted class');