Model-independent cross validation

This recipe explains how yaplf can perform k-fold cross validation independently of the chosen learning algorithm. A basic knowledge of the python programming language is required, as well as the comprehension of the basic concepts related to multilayer perceptrons and corresponding learning algorithms.

A light gray cell denotes one or more python statements, while a subsequent darw gray cell contains the expected output of the above statements. Statements can either be executed in a python or in a sage shell. For sake of visualization this document assumes that statements are executed in a sage notebook, so that graphics are shown right after the cell generating them. The execution in a pure python environment works in the same way, the only difference being that graphic functions return a matplotlib object that can be dealt with as usual.

Cross validation

When faced to a learning algorithm dependent on one or more parameters, the corresponding values should not be fixed before the learning process start. Rather, they should be chosen among a set of possible candidates with reference to the learning process itself. The typical technique used in this case is called k-fold cross validation, and works as follows: for each candidate parameter selection, the available set of examples is partitioned in k subsets, k-1 of which are used for training while the remaining one is used for testing the inferred model. The process is repeated k times, changing each time the subset using for testing purposes, and averaging the obtained results. At the end, the architecture scoring the lower average is selected.

Cross validation in yaplf

The function cross_validation defined in package yaplf.utility provides k-fold cross validation, for a chosen value k, independently of the chosen model and learning algorithm: the only requirement is that the parameters affecting the model architecture need to be specified as named arguments in the corresponding learning algorithm constructor. For instance, the following code automatically selects the best sigmoid activation function in a set of four ones in order to learn the TC sample introduced in the recipe dealing with multilayer perceptrons:

from yaplf.utility import cross_validation, SigmoidActivationFunction
from yaplf.utility import FixedIterationsStoppingCriterion
from yaplf.algorithms.neural import BackpropagationAlgorithm
from yaplf.data import LabeledExample
tc_sample = (LabeledExample((0.9, 0.9, 0.9, 0.9, 0.1, 0.1, 0.9, 0.9, 0.9), (0.1,)),
             LabeledExample((0.9, 0.9, 0.9, 0.1, 0.9, 0.1, 0.1, 0.9, 0.1), (0.9,)),
             LabeledExample((0.9, 0.9, 0.9, 0.9, 0.1, 0.9, 0.9, 0.1, 0.9), (0.1,)),
             LabeledExample((0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.1, 0.1, 0.9), (0.9,)),
             LabeledExample((0.9, 0.9, 0.9, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9), (0.1,)),
             LabeledExample((0.1, 0.9, 0.1, 0.1, 0.9, 0.1, 0.9, 0.9, 0.9), (0.9,)),
             LabeledExample((0.9, 0.1, 0.9, 0.9, 0.1, 0.9, 0.9, 0.9, 0.9), (0.1,)),
             LabeledExample((0.9, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.1, 0.1), (0.9,)))
   
p = cross_validation(BackpropagationAlgorithm, tc_sample,
  (('activations',
   (SigmoidActivationFunction(2), SigmoidActivationFunction(3),
    SigmoidActivationFunction(5), SigmoidActivationFunction(10))
  ), ),
  fixed_parameters = {'dimensions': (9, 2, 1)},
  run_parameters = {'stopping_criterion':
  FixedIterationsStoppingCriterion(1000), 'learning_rate': 1},
  num_folds = 4, verbose = True)
Errors: [0.030319163781913711, 0.060571552036313515,
0.029535618302241346, 0.4099711142423762]
Minimum error in position 2
Selected parameters (SigmoidActivationFunction(5),)

The cross_validation requires the specification of a learning algorithm class and of a labeled sample, followed by a list or tuple whose elements are in turn list or tuples containing:

For instance, in previous cell only the activations named argument of the BackpropagationAlgorithm class is to be cross-validated, considering four different sigmoidal activation functions.

The cross_validation function accepts several named arguments affecting how the cross validation procedure is performed:

For instance, in previous cell all inferred multilayer perceptrons had nine inputs, une outut and an hidden layer composed by two units, and each time the backpropagation algorithm was run for a thousand iterations with unit learning rate. Moreover, for each possible parameter value the sample was partitioned into four subsets. Finally, note that cross_validation returns a new model, inferred using the combination of parameters which scored the best error. As such, the returned value can be used for instance in order to quantitatively test the original sample:

p.test(tc_sample, verbose = True)
(0.900000000000000, 0.900000000000000, 0.900000000000000,
0.900000000000000, 0.100000000000000, 0.100000000000000,
0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to
0.0987184591155, label is (0.100000000000000,), error [  1.64234704e-06]
(0.900000000000000, 0.900000000000000, 0.900000000000000,
0.100000000000000, 0.900000000000000, 0.100000000000000,
0.100000000000000, 0.900000000000000, 0.100000000000000) mapped to
0.995638751026, label is (0.900000000000000,), error [ 0.00914677]
(0.900000000000000, 0.900000000000000, 0.900000000000000,
0.900000000000000, 0.100000000000000, 0.900000000000000,
0.900000000000000, 0.100000000000000, 0.900000000000000) mapped to
0.0987060693926, label is (0.100000000000000,), error [  1.67425642e-06]
(0.100000000000000, 0.100000000000000, 0.900000000000000,
0.900000000000000, 0.900000000000000, 0.900000000000000,
0.100000000000000, 0.100000000000000, 0.900000000000000) mapped to
0.900933293696, label is (0.900000000000000,), error [  8.71037123e-07]
(0.900000000000000, 0.900000000000000, 0.900000000000000,
0.100000000000000, 0.100000000000000, 0.900000000000000,
0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to
0.0991757359563, label is (0.100000000000000,), error [  6.79411214e-07]
(0.100000000000000, 0.900000000000000, 0.100000000000000,
0.100000000000000, 0.900000000000000, 0.100000000000000,
0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to
0.995353479277, label is (0.900000000000000,), error [ 0.00909229]
(0.900000000000000, 0.100000000000000, 0.900000000000000,
0.900000000000000, 0.100000000000000, 0.900000000000000,
0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to
0.0987035440608, label is (0.100000000000000,), error [  1.68079800e-06]
(0.900000000000000, 0.100000000000000, 0.100000000000000,
0.900000000000000, 0.900000000000000, 0.900000000000000,
0.900000000000000, 0.100000000000000, 0.100000000000000) mapped to
0.899575874803, label is (0.900000000000000,), error [  1.79882183e-07]
MSE 0.002280723055
0.002280723055

Finally, it is worth noting that: