SV classifiers

This recipe explains how yaplf works with support vector classifiers. A basic knowledge of the python programming language is required, as well as of the concept behind SV classification.

A light gray cell denotes one or more python statements, while a subsequent darw gray cell contains the expected output of the above statements. Statements can either be executed in a python or in a sage shell. For sake of visualization this document assumes that statements are executed in a sage notebook, so that graphics are shown right after the cell generating them. The execution in a pure python environment works in the same way, the only difference being that graphic functions return a matplotlib object that can be dealt with as usual.

The labeled sample fed to a SV classification algorithm is a list or tuple of LabeledExample instances whose labels are either set to 1 or to -1; once this set is available, it is possible to create an instance of the SVMClassificationAlgorithm:

from yaplf.data import LabeledExample
from yaplf.algorithms.svm.classification import SVMClassificationAlgorithm
and_sample = [LabeledExample((1., 1.), 1.), LabeledExample((0., 0.), -1.),
  LabeledExample((0, 1), -1.), LabeledExample((1, 0), -1.)]
alg = SVMClassificationAlgorithm(and_sample)

The learning algorithm should then be executed, precisely as with other learning algorithms in yaplf, through invocation of the run function, which in this case takes no arguments. Subsequently, the inferred model is available in the model field, in form of a SVMClassifier instance, which as such can be used for instance in order to draw its decision function:

alg.run()
model = alg.model
print model
model.plot((0., 1.7), (0., 1.7), separator_color = 'red', margin = True)
SVMClassifier([3.000000000475791, 1.0000000001901006,
1.0000000001901004], -3.00000000038, [LabeledExample((1.00000000000000,
1.00000000000000), 1.0), LabeledExample((0, 1), -1.0),
LabeledExample((1, 0), -1.0)], LinearKernel())
sage-output-0

The above code shows how a SVClassifier instance has its own plot function, accepting special named argument in order to deal with the pecuilarity of SV classifiers. In particular, margin is a boolean flag triggering the visualization of the margin region, while separator_color allows the customization of the colour used in order to draw the separating surface.

Even when running the yaplf code within sage, it is possible to use matplotlib in order to render graphic objects. In this way it is possible to obtain graphic features not (yet?) implemented in sage, such as dotted/dashed lines. To attain this behaviour it is necessary to set the plotter named argument of plot to a newly created instance of the MatplotlibPlotter class in package yaplf.graph:

from yaplf.graph import MatplotlibPlotter
fig = model.plot((0., 1.7), (0., 1.7), plotter = MatplotlibPlotter(),
  separator_color = 'red', separator_style = '--', margin = True,
  margin_width = 3, color_bar = True)
fig.savefig('classifier.png')
sage-output-1

The use of kernels in order to get nonlinear separating surfaces requires to specify the kernel named argument, setting its value to an instance of a Kernel subclass defined in package yaplf.models.kernel. For instance, the next cell learns a SV classifier for the binary XOR function exploiting a polynomial kernel, prints it and shows a graph of its decision function:

from yaplf.models.kernel import PolynomialKernel
xor_sample = [LabeledExample((1., 1.), -1.), LabeledExample((0., 0.), -1.),
  LabeledExample((0, 1), 1.), LabeledExample((1, 0), 1.)]
alg = SVMClassificationAlgorithm(xor_sample, kernel = PolynomialKernel(2))
alg.run()
model = alg.model
print model
model.plot((0., 2.), (0., 2.), margin = True)
SVMClassifier([1.9956464353279999, 3.3260773922133327, 2.6594107255466666,
2.6623131019946662], -0.997823217664, [LabeledExample((1.00000000000000,
1.00000000000000), -1.0), LabeledExample((0.000000000000000,
0.000000000000000), -1.0), LabeledExample((0, 1), 1.0),
LabeledExample((1, 0), 1.0)], kernel = PolynomialKernel(2))
sage-output-2

The decision function graph of an inferred SV classifier can be customized. For instance, the next cell modifies the colour of the margin region, superimposing its decision function graph with the training set scatter plot:

from yaplf.data import classification_data_plot

cf = lambda x: ('white' if x.label == 1 else 'red')
fig_xor_sample = classification_data_plot(xor_sample, color_function = cf)

alg = SVMClassificationAlgorithm(xor_sample, kernel = PolynomialKernel(2))
alg.run()
model = alg.model
fig_xor_model = model.plot((-1, 2), (-1, 2), margin = True, separator = True,
  shading = True, margin_color = 'red')
fig_xor_model + fig_xor_sample
sage-output-3

The next cell shows another graphic customization concerning the colormap to be used in order to render the gradient showing the decision function:

from matplotlib.cm import Blues
alg = SVMClassificationAlgorithm(xor_sample, kernel = PolynomialKernel(2),
  c = 100)
alg.run()
model = alg.model
fig_xor_model_color = model.plot((-.9, 1.9), (-.9, 1.9), margin = True,
  separator = True, shading = True, margin_color = 'red', margin_width = 1,
  margin_style='--', shading_color=Blues)
fig_xor_model_color + fig_xor_sample
sage-output-4

As shown above, the same effect can be obtained with matplotlib, exploiting some peculiarities such as the possibility of adding a color bar to the whole picture:

from yaplf.graph import MatplotlibPlotter
fig_mpl_2 = classification_data_plot(xor_sample, color_function = cf,
  plotter = MatplotlibPlotter())
fig_mpl_2 = model.plot((-.9, 1.9), (-.9, 1.9), margin = True,
  separator = True, shading = True, margin_color = 'red', margin_width=1,
  margin_style='--', shading_color=Blues, plotter = MatplotlibPlotter(),
  base = fig_mpl_2, color_bar = True)
fig_mpl_2.savefig('mpl-svm-xor-color.png')
sage-output-5

Finally, the decision function visualization can be applied also for SV classifiers having three inputs. In this case the produced 3D graphics is rendered through jmol, which allow the user to dynamically rotate and zoom it:

td_sample = [LabeledExample((1., 1., 1.), 1),
  LabeledExample((0., 0., 1.), -1), LabeledExample((0., 1, 0.), -1),
  LabeledExample((1., 0., 0.), -1), LabeledExample((0., 0., 0.), 1)]
p0 = classification_data_plot(td_sample,
  color_function=lambda x: ('green' if x.label==1 else 'yellow'))
alg = SVMClassificationAlgorithm(td_sample, c = 100,
  kernel = PolynomialKernel(3))
alg.run()
print alg.model
SVMClassifier([0.14243861607142858, 0.42689732142857145, 0.42815290178571436,
0.42689732142857145, 1.1395089285714286], 0.996484375,
[LabeledExample((1.00000000000000, 1.00000000000000, 1.00000000000000), 1.0),
LabeledExample((0.000000000000000, 0.000000000000000, 1.00000000000000),
-1.0), LabeledExample((0.000000000000000, 1, 0.000000000000000), -1.0),
LabeledExample((1.00000000000000, 0.000000000000000, 0.000000000000000),
-1.0), LabeledExample((0.000000000000000, 0.000000000000000,
0.000000000000000), 1.0)], kernel = PolynomialKernel(3))
p1=alg.model.plot((-.5, 3.), (-1., 3.), (-1., 3.))
p0+p1

The previous cell also shows how a particular value of the tradeoff constant C balancing accuracy with function steepness can be set through the named argument c.