SANTTUcurriculum vitae
04 Sep 2018

## Machine Learning – Kernel Methods: Implementation for Support Vector Machines ( SVM ) for Classification (Linear)

Implementing soft margin SVM algorithm using linear kernel for classification of hand written digits (3 and 8) with a detailed Matlab code.

## Contents

```%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% SVM Linear Kernel                                                                          %
% Step by step detailed implementation for soft margin SVM                                   %
% Santosh Tirunagari                                                                         %
%                                                                                            %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
```

## Parameters \;and\;C\;=\;\{0.0001, 0.001, 0.01, 0.1, 1\}\$\$”>

## Datasets

The datasets for hand written digits used are training set and test set with 500 samples and 256 features. Class labels for training and test set are denoted as for and for . There are 250 samples each for handwritten 3s and 8s for both the training and test sets. Download the dataset

usps_3_vs_8_train_y

usps_3_vs_8_train_X

usps_3_vs_8_test_X

usps_3_vs_8_test_y

Load and zscore normalize both the train and test data using zscore matlab command.

```load usps_3_vs_8_train_X.txt;
```

## Zscore normalization

```Y = usps_3_vs_8_train_y;
X = zscore(usps_3_vs_8_train_X);
X_t = zscore(usps_3_vs_8_test_X);
Y_t = usps_3_vs_8_test_y;
```

## initialization of vectors

```result_train =[];
result_test = [];
support = [];

warning off;
%
for k = 1:1:5
```
```    C = [0.0001,0.001,0.01,0.1,1];
K = X*X'; % Calculate linear kernel function
H = (Y*Y').*K;
A = [];
Aeq = Y';
l = zeros(500,1);
c = -1*ones(500,1);
b = [];
beq = 0;
u = C(k)*ones(500, 1);
options = optimset('Algorithm','interior-point-convex');
```

## calculate Alphas

```    %Solve SVM dual optimization problem in standard QP form using quadprog function.
alpha = quadprog(H, c, A, b, Aeq, beq, l, u,[],options);
```
```Minimum found that satisfies the constraints.
```

## Compute the near boundary coefficients

```    alpha(alpha < C(k) * 0.001) = 0;
alpha(alpha > C(k)*0.999999999999) = C(k);% move near boundary alpha values to boundaries
```

## Calculate Support Vectors

```    sv = find(alpha >0 & alpha<C(k));
sv_one = zeros(500,1);
sv_one(sv,1) = 1;
```

## Compute bias parameter b

```    b = sv_one'*(Y-((alpha.*Y)'*K')')/sum(sv_one);
s = length(sv);
s=(s/500)*100;
support = [support;s];
```

## Compute the decision function on training set

```    Ki = X(sv,:)*X';
temp = bsxfun(@plus,Ki'*(alpha(sv,:).*Y(sv,:)),b);
res = temp;
res(res>=0) = 1;
res(res<0) = -1;
r = sum(res~=Y);
r=(r/500)*100;
result_train = [result_train;r];
```

## Compute the decision function on test set

```    Ki = X(sv,:)*X_t';
temp = bsxfun(@plus,Ki'*(alpha(sv,:).*Y(sv,:)),b);
res = temp;
```

## Threshold the decisions

```    res(res>=0) = 1;
res(res<0) = -1;
```

## Calculate the misclassification errors

```    r = sum(res~=Y_t);
r=(r/500)*100;
result_test = [result_test;r];
```
```end
```

## plot the graphs of misclassification errors

```set(0,'DefaultAxesFontWeight','bold')
set(0,'DefaultAxesFontSize',)
set(0,'DefaultTextFontSize',)
h = figure;
hut = log10(C);
plot(hut,result_train,'k-*','LineWidth',5,'MarkerSize',10);
hold on
plot(hut,result_test,'r-o','LineWidth',5,'MarkerSize',10);
set(gca,'YTick',[0 5 10])
set(gca,'XTick',log10(C))
xlabel('log10(C)');
ylabel('Mis-Classification Error');
hleg1 = legend('Train','Test');
title('Linear Kernel');
set(hleg1,'Location','NorthEast')
set(hleg1,'Interpreter','none')
saveas(h,'lk_mce','png')
``` ## plot the support vector count percentage

```h = figure;
hut = log10(C);
plot(hut,support,'k-*','LineWidth',5,'MarkerSize',10);
set(gca,'YTick',[0 50 100])
set(gca,'XTick',log10(C))
xlabel('log10(C)');
ylabel('Support Vector');
title('Linear Kernal');
saveas(h,'lk_sv','png')
``` C Mis-Classification Error on Training set Mis-Classification Error on Test set Support Vectors %
0.0001 5.2 8 100
0.001 0.8 2 57
0.01 0.2 3.6 22.6
0.1 0 3.6 14.8
1 0 3.6 12.6
machine learning • matlab • teaching