This recipe explains how yaplf works with perceptrons, both in the original version and in the more flexible multilayer extension. A basic knowledge of the python programming language is required, as well as the comprehension of the basic concepts related to perceptrons, multilayer perceptrons and their corresponding learning algorithms.
A light gray cell denotes one or more python statements, while a subsequent darw gray cell contains the expected output of the above statements. Statements can either be executed in a python or in a sage shell. For sake of visualization this document assumes that statements are executed in a sage notebook, so that graphics are shown right after the cell generating them. The execution in a pure python environment works in the same way, the only difference being that graphic functions return a matplotlib object that can be dealt with as usual.
Perceptrons are obtained through instances of the Perceptron
class (in package yaplf.models.neural
). The corresponding constructor needs a sequence in order to set the perceptron weights: this sequence should contain a sequence of numeric values for each output of the perceptron, and all such sequences must have the same length, identifying in turn the number of inputs. Optionally, the threshold
named argument can be used in order to add thresholds to the perceptron. In this case the argument must be assigned to a sequence of numeric values whose length equals the number of outputs. When this argument is not set, the perceptron will have null thresholds. Finally, the activation
named argument allows to select a particular activation function: activation functions are defined through appropriate strategy classes in package yaplf.utility
, which includes:
HeavisideActivationFunction
, for the step activation function, used as default when the activation argument is missing;SigmoidActivationFunction
, for the sigmoidal activation function whose steepness parameter can be set through the named argument beta
of the constructor, 1 being its default value;HyperbolicTangentActivationFunction
, for the hyperbolic tangent activation funcation whose steepness parameter can be set through the named argument beta
of the constructor, 1 being its default value;LinearActivationFunction
, for the linear activation function whose steepness parameter can be set through the named argument beta
of the constructor, 1 being its default value.For instance, the next cell creates a thresholded perceptron with two inputs and one output:
from yaplf.models.neural import Perceptron from yaplf.utility import SigmoidActivationFunction p = Perceptron( ((.3, 9.56),), threshold=(1.7,), activation=SigmoidActivationFunction(beta=.1))
The behaviour of a perceptron having two inputs can be inspected graphically invoking its plot
method, specifying a set of ranges for the input values. Moreover, the graph can be enriched through the following named arguments:
contours
, to be set to a sequence of values for the perceptron outputs to be highlighted through contours,contour_color
, to be set to a color value or to a sequence of color values to be applied to the contours specified through the contours
argument (if only one value is specified, it applies to all contours),shading
, to be set either to True
or False
in order to trigger a density plot of the perceptron output in function of the input values,plot_points
, in order to tune the above mentioned density plot accuracy.For instance, the following cell generates the decision function plot for the previously created perceptron when both inputs vary between -5 and 5, highlighting in different colours all zones where the perceptron outputs 0.1, 0.5 and 0.9:
p.plot((-5, 5), (-5, 5), plot_points = 100, contours=(0.1, 0.5, 0.9), contour_color=('red', 'green', 'blue'), shading = True)
The obtained graph says that as the second input value increases, almost regardless of the remaining input, the perceptron output will raise from 0 to 1. We can check this fact for instance through explicit computation of the decision function:
p.compute((0,4))
0.974765873307
The decision function graph can be obtained also for perceptrons having three inputs. In this case the only options affecting the plot are those related to contours:
pp = Perceptron( ((.3, 9.56, .2),), threshold=(1.7,), activation=SigmoidActivationFunction(beta=.1)) pp.plot((-5,5), (-5,5), (-5,5), plot_points = 20, contours=(0.1, 0.5, 0.9), contour_color=('red', 'green', 'blue'), shading = True)
Given a sample, it is possible to train a perceptron through a suitable learning algorithm. Such algorithms reside in package yaplf.algorithms
or in suitable subpackages of it. For instance the following cell declares a variable and_sample
containing the above mentioned bitwise and sample and creates an instance of the class GradientPerceptronAlgorithm
for it, specifying:
threshold
),weight_bound
),beta
).from yaplf.data import LabeledExample and_sample = (LabeledExample((1, 1), 1), LabeledExample((0, 0), 0), LabeledExample((0, 1), 0), LabeledExample((1, 0), 0)) from yaplf.algorithms.neural import GradientPerceptronAlgorithm and_sample = [LabeledExample((1., 1.), (1,)), LabeledExample((0., 0.), (0,)), LabeledExample((0, 1), (0,)), LabeledExample((1, 0), (0,))] alg = GradientPerceptronAlgorithm(and_sample, threshold = True, weight_bound = 0.1, beta = 0.8)
It is worth stressing that labels are expressed as tuples (or lists) even when they consist of only one element.
In order to get a perceptron hopefully describing with a good accuracy the provided sample, the algorithm must be run. Typically this phase involves the invocation of the run
function, specifying a proper set of parameters driving the search for a good approximation. In the following cell, the algorithm is run specifying the following options:
learning_rate
),batch
)FixedIterationsStoppingCriterion
strategy class (stopping_criterion
):from yaplf.utility import FixedIterationsStoppingCriterion alg.run(stopping_criterion = FixedIterationsStoppingCriterion(5000), batch = False, learning_rate = .1)
Once the learning algorithm has been run, the model
field of the corresponding object contains a Perceptron
object:
alg.model
Perceptron([array([ 4.01468351, 4.01077867])], threshold = [6.1645005716113204], activation = SigmoidActivationFunction(0.800000000000000))
The performances of the obtained perceptron can be inspected through testing it on the data using for training. This gives obviously an estimate strongly biased on the data used for training the perceptron, and is used here only for demonstration purposes, also noting that all possible examples have been used during the learning phase. In a more realistic scenario, data is available in form of a higher number of examples, partly to be retained for testing the learning outcome. Testing a model on a given sample requires the invocation of the test function, specifying an error criterion as second argument. This is accomplished through instantiation of a strategy class in package yaplf.utility
. In particular, MSE
corresponds to the mean square error between labels in examples and outputs of the perceptron when the corresponding patterns are used as inputs.
from yaplf.utility import MSE alg.model.test(and_sample, MSE())
0.0199914315765
The answer is pretty interesting as the mean error have two magnitudes less than each label value. To have a more detailed view, the verbose
argument prints out also the error for each example:
alg.model.test(and_sample, MSE(), verbose = True)
(1.00000000000000, 1.00000000000000) mapped to 0.815893477793, label is (1,), error [ 0.03389521] (0.000000000000000, 0.000000000000000) mapped to 0.00716326425116, label is (0,), error [ 5.13123547e-05] (0, 1) mapped to 0.151488037962, label is (0,), error [ 0.02294863] (1, 0) mapped to 0.151890015428, label is (0,), error [ 0.02307058] MSE 0.0199914315765 0.0199914315765
Another way to get information about the learning process is that of using a trajectory class able to collect information on the fly, querying the algorithm object as learning evolves. All trajectory classes must be instantiated specifying a learning algorithm as argument, subsequently running the algorithm and finally invoking the get_trajectory
function on the trajectory. For instance the ErrorTrajectory
can be used in order to obtain a graphic showing how the perceptron error evolves in function of the number of iterations:
from yaplf.utility import ErrorTrajectory alg = GradientPerceptronAlgorithm(and_sample, threshold = True, weight_bound = 0.1, beta = 0.8) errObs = ErrorTrajectory(alg) alg.run(stopping_criterion = FixedIterationsStoppingCriterion(5000), batch = False, learning_rate = .1) errObs.get_trajectory(color='red', joined = True)
Of course, the model
field of the PerceptronLearningAlgorithm
object can be used as previously described, that is, computing its decision function or plotting it:
alg.model.plot((0, 1), (0, 1), contours=(0.5,), shading=True)
There are also other ways of observing a learning algorithm run: for instance, the class PerceptronWeightTrajectory
outputs the trajectory described by the perceptron weights (seen as a point in 2D or 3D space) during the learning phase:
from yaplf.utility import PerceptronWeightTrajectory alg = GradientPerceptronAlgorithm(and_sample, threshold = True, weight_bound = 0.1, beta = 0.8) weightObs = PerceptronWeightTrajectory(alg) alg.run(stopping_criterion = FixedIterationsStoppingCriterion(5000), batch = False, learning_rate = .1) weightObs.get_trajectory(joined = True)
Multilayer perceptrons are made up by a set of regular perceptrons stacked up in layers. In other words, given a set of inputs, these are fed into an initial perceptron, whose outputs are considered as inputs to a second perceptron, and so on until a last perceptron, whose outputs are also the outputs of the multilayer perceptron. A multilayer perceptron is said to be composed by several layers, whose first and last one are said input and output layers for obvious reasons, while the remaining ones are called hidden layers. Each layer is composed by a specific number of units:
Thus, when no hidden layers exist the whole structure collapses to a regular perceptron. Usually each layer's units can have their own activation function
Multilayer perceptrons are defined in yaplf through the class MultilayerPerceptron
in package yaplf.models.neural
. Its constructor requires:
thresholds
containing the thresholds values for each unit in the multilayer perceptron; when this argument is not specified all thresholds are zeroed, that is the multilayer perceptrons has no thresholds at all. For instance, the next cell creates a multilayer perceptron computing the binary XOR function:activations
containing a list of activation functions, each to be associated to a particular layer; alternatively, a single activation function can be directly associated to each unit in the multilayer perceptron.from yaplf.models.neural import MultilayerPerceptron p = MultilayerPerceptron([2, 2, 1], [[(1, -1), (-1, 1)], [(1, 1)]], thresholds = [(-1, -1), (-1,)])
The fact that this perceptron computes the binary XOR can be shown plotting the corresponding decision function. This can be done precisely in the same way for regular perceptrons, that is calling the function plot
:
p.plot((-0.1, 1.1), (-0.1, 1.1), plot_points = 100, shading = True)
The above plot reveals a crisp cange between two zones in the input space: this is due to the default activation function, which is the Heaviside one. The function plot
has the same signature as with regular perceptrons, as shown in the next cell where another XOR perceptron, with smooth decision function, is graphed, highlighting in red the zones where its decision function assumes the value 0.2
:
from yaplf.utility import SigmoidActivationFunction p = MultilayerPerceptron([2, 2, 1], [[(1, -1), (-1, 1)], [(1, 1)]], thresholds = [(-1, -1), (-1,)], activations = SigmoidActivationFunction(beta = 2)) p.plot((-0.1, 1.1), (-0.1, 1.1), shading=True, contours = (0.2, ), contour_color = ('red',))
The function plot
works also for multilayer perceptrons having three inputs; its signature does not change, the only difference consists in the (obvious) requirement of three independent variables' ranges:
p = MultilayerPerceptron([3, 2, 1], [[(1, -1, -.5), (-1, 1, 1.3)], [(1, 1)]], thresholds = [(-1, -1), (-1,)], activations = SigmoidActivationFunction(beta = 2)) p.plot((-0.1, 1.1), (-0.1, 1.1), (0, 1), contours=(0.1, 0.5, 0.9), contour_color=('red', 'green', 'blue'))
When executed within a sage notebook, the above graph is displayed through jmol and thus allows a dynamic interaction with the user who can rotate and zoom the whole picture.
Multilayer perceptrons are seldom directly defined as in previous cells. As customary in machine learning, they are typically induced from examples. In order to show how a typical learning process evolves, consider the following T-C sample:
from yaplf.data import LabeledExample tc_sample = ( LabeledExample((0.9, 0.9, 0.9, 0.9, 0.1, 0.1, 0.9, 0.9, 0.9), (0.1,)), LabeledExample((0.9, 0.9, 0.9, 0.1, 0.9, 0.1, 0.1, 0.9, 0.1), (0.9,)), LabeledExample((0.9, 0.9, 0.9, 0.9, 0.1, 0.9, 0.9, 0.1, 0.9), (0.1,)), LabeledExample((0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.1, 0.1, 0.9), (0.9,)), LabeledExample((0.9, 0.9, 0.9, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9), (0.1,)), LabeledExample((0.1, 0.9, 0.1, 0.1, 0.9, 0.1, 0.9, 0.9, 0.9), (0.9,)), LabeledExample((0.9, 0.1, 0.9, 0.9, 0.1, 0.9, 0.9, 0.9, 0.9), (0.1,)), LabeledExample((0.9, 0.1, 0.1, 0.9, 0.9, 0.9, 0.9, 0.1, 0.1), (0.9,)) )
It is easy to see that patterns in this sample describe the 'T' and 'C' shapes in a 3x3 grid, in every possible rotation. The package yaplf.algorithms.neural
contains the BackPropagationAlgorithm
class, implementing the well known error backpropagation learning algorithm. When instantiated, this class requires the specification of a labeled sample, followed by a list or tuple of numerical values describing the number of layers and the number of units therein, including the input layer. This argument is actually not mandatory, for when it is not specified the input and output layers are chosen in function of the dimension of patterns and labels in the labeled example, and a hidden layer is added, automatically dimensioning it to half the maximum value between the number of input and output units. The class constructor accepts many other optional named arguments allowing the personalization of the backpropagation algorithm. Among these:
dimensions
should contain a list or tuple of numerical values describing the number of layers and the number of units therein, including the input layer. When not specified, the input and output layers are chosen in function of the dimension of patterns and labels in the labeled example, and a hidden layer is added, automatically dimensioning it to half the maximum value between the number of input and output units;threshold
should contain a boolean flag setting the use of thresholded perceptrons.activations
, should contain an ActivationFunction
instance or a list/tuple of ActivationFunction instances, to be used as activation functions. When a single value is specified, it applies to all units. When a list/tuple is specified, each element corresponds to all units in a perceptron layer.Thus the following cell creates a backpropagation algorithm for the previously defined T-C labeled sample, using an input and output layers consisting respectively of 9 and 1 unit (as required by the labeled sample), and one hidden layer with two units; moreover, in the inferred multilayer perceptrons all units should rely on a sigmoidal activation function and no thresholds will be used:
from yaplf.algorithms.neural import BackpropagationAlgorithm from yaplf.utility import SigmoidActivationFunction alg = BackpropagationAlgorithm(tc_sample, (9, 2, 1), threshold = False, activations = SigmoidActivationFunction(1))
This learning algorithm is run precisely as with the previously described algorithms for regular perceptrons (as well as any learning algorithm in yaplf), that is invoking the run
function and subsequently accessing the model
field of the learning algorithm instance. Being the error backpropagation algorithm run iteratively, also in this case it is possible to fix beforehand the number of iterations to be run and possibly monitor how learning evolves through a graphic observer such as ErrorTrajectory
:
from yaplf.utility import FixedIterationsStoppingCriterion from yaplf.utility import ErrorTrajectory errObs = ErrorTrajectory(alg) alg.run(stopping_criterion = FixedIterationsStoppingCriterion(7000)) errObs.get_trajectory(color='red', joined = True)
Testing the inferred model against the sample used in order to obtain it requires again to act precisely as with regular perceptrons, that is calling the test
function:
alg.model.test(tc_sample, verbose = True)
(0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to 0.124228179725, label is (0.100000000000000,), error [ 0.000587] (0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.100000000000000) mapped to 0.87545728224, label is (0.900000000000000,), error [ 0.00060234] (0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000) mapped to 0.126027624963, label is (0.100000000000000,), error [ 0.00067744] (0.100000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000) mapped to 0.875275157516, label is (0.900000000000000,), error [ 0.00061132] (0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to 0.125884390249, label is (0.100000000000000,), error [ 0.00067] (0.100000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to 0.876301374618, label is (0.900000000000000,), error [ 0.00056162] (0.900000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to 0.124042458871, label is (0.100000000000000,), error [ 0.00057804] (0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000) mapped to 0.876338859643, label is (0.900000000000000,), error [ 0.00055985] MSE 0.000605952585007 0.000605952585007
It is worth stressing that the common functionalities of perceptrons and multilayer perceptrons, as well as those of the corresponding learning algorithms, are actually basic features of any model and any learning algorithm implemented in yaplf.
The various backpropagation variants can be obtained through specification of particular named arguments when instantiating the BackpropagationAlgorithm
class. For instance, learning_rate
and momentum_term
obviously set the algorithm learning rate and momentum term.
alg = BackpropagationAlgorithm(tc_sample, (9, 2, 1), threshold = True, learning_rate = 1, momentum_term = 0.2, activations = SigmoidActivationFunction(3)) errObs = ErrorTrajectory(alg) alg.run(stopping_criterion = FixedIterationsStoppingCriterion(1000)) errObs.get_trajectory(color='red', joined = True)
Testing this newly inferred multilayer perceptron shows how the introduction of a momentum term (when the corresponding named argument is missing, no momentum is used) attained a better performance, as also perceivable by visual inspection of the previous graph:
alg.model.test(tc_sample)
9.80614378187e-05
Actually, the T-C sample is fairly simple to learn, as the central pixel in the 3x3 grid could be used alone in order to discriminate Ts by Cs. A more complex learning task, well suited for the backpropagation algorithm, is that concerning the XOR binary function, which is not linearly separable and cannot be expressed by a regular perceptron:
xor_sample = [LabeledExample((0.1, 0.1), (0.1,)), LabeledExample((0.1, 0.9), (0.9,)), LabeledExample((0.9, 0.1), (0.9,)), LabeledExample((0.9, 0.9), (0.1,))] alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1), threshold = True, learning_rate = .1, momentum_term = 0.9) errObs = ErrorTrajectory(alg) alg.run(stopping_criterion = FixedIterationsStoppingCriterion(1000)) errObs.get_trajectory(color='red', joined = True)
Although the error trajectory was decreasing, it converged to too high a value. Indeed, explicitly testing with the original sample produces the following result:
p = alg.model p.test(xor_sample, verbose = True)
(0.100000000000000, 0.100000000000000) mapped to 0.499876038427, label is (0.100000000000000,), error [ 0.15990085] (0.100000000000000, 0.900000000000000) mapped to 0.499904660513, label is (0.900000000000000,), error [ 0.16007628] (0.900000000000000, 0.100000000000000) mapped to 0.500053062319, label is (0.900000000000000,), error [ 0.15995755] (0.900000000000000, 0.900000000000000) mapped to 0.500081767769, label is (0.100000000000000,), error [ 0.16006542] MSE 0.160000025162 0.160000025162
In other words, all patterns are mappes to values close to 0.5. This is confirmed by the decision function plot of the inferred model, which exhibits a linear discrimination behaviour:
p.plot((0, 1), (0, 1), shading = True)
Getting back to the simpler backpropagation variant (i.e. that not using momentum) and using a smoother activation function attains better performance:
alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1), learning_rate = .1, activations = SigmoidActivationFunction(10)) errObs = ErrorTrajectory(alg) alg.run(stopping_criterion = FixedIterationsStoppingCriterion(1000)) errObs.get_trajectory(color='red', joined = True)
alg.model.test(xor_sample, verbose = True)
(0.100000000000000, 0.100000000000000) mapped to 0.107281883481, label is (0.100000000000000,), error [ 5.30258270e-05] (0.100000000000000, 0.900000000000000) mapped to 0.889216991161, label is (0.900000000000000,), error [ 0.00011627] (0.900000000000000, 0.100000000000000) mapped to 0.350484814876, label is (0.900000000000000,), error [ 0.30196694] (0.900000000000000, 0.900000000000000) mapped to 0.353978869018, label is (0.100000000000000,), error [ 0.06450527] MSE 0.0916603759241 0.0916603759241
Now at least two out of the four examples are learnt. Hoping that further iterations of the learning algorithm could bring to a better performance, the run
function can be invoked again: this will have the effect of continuing the learning process:
alg.run(stopping_criterion = FixedIterationsStoppingCriterion(1000)) errObs.get_trajectory(color='red', joined = True)
alg.model.test(xor_sample)
0.0916130351364
As the result is not satisfactory, the process should be restarted, hoping that the random initialization could help escaping this local minima of the error function. This can either be accomplished through a new instantiation of the BackpropagationAlgorithm
class or invoking the reset
function:
alg.reset() alg.run(stopping_criterion = FixedIterationsStoppingCriterion(1000)) errObs.get_trajectory(color='red', joined = True)
The effect of this action is twofold: on the one hand, the graph returned by the ErrorTrajectory
instance shows that learning was reset, on the other one, the last invocation of run
produced a better result, as can be quantitatively shown testing the inferred multilayer perceptron:
alg.model.test(xor_sample)
1.01130579975e-05
Another variant of the backpropagation algorithm is obtained through specification of the min_error
named argument in the class constructor. This action will have the effect of stopping to consider patterns that already are correctly classified up to a threshold error (whose value is that of this argument):
alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1), threshold = True, learning_rate = .3, momentum_term = .8, min_error = 0.1, activations = SigmoidActivationFunction(10)) errObs = ErrorTrajectory(alg) alg.run(stopping_criterion = FixedIterationsStoppingCriterion(1000)) errObs.get_trajectory(color='red', joined = True)
Now learning was particularly performing, as can be shown either through testing the inferred multilayer perceptron or plotting its decision function:
alg.model.test(xor_sample, verbose = True)
(0.100000000000000, 0.100000000000000) mapped to 0.100065252639, label is (0.100000000000000,), error [ 4.25790691e-09] (0.100000000000000, 0.900000000000000) mapped to 0.899884058108, label is (0.900000000000000,), error [ 1.34425222e-08] (0.900000000000000, 0.100000000000000) mapped to 0.899802877389, label is (0.900000000000000,), error [ 3.88573239e-08] (0.900000000000000, 0.900000000000000) mapped to 0.100302095653, label is (0.100000000000000,), error [ 9.12617837e-08] MSE 3.69548841712e-08 3.69548841712e-08
alg.model.plot((0, 1), (0, 1), shading = True)
Of course, different variants of the backpropagation algorithm can produce similar results:
alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1), threshold = True, learning_rate = .1, momentum_term = .3, min_error = 0.1, activations = SigmoidActivationFunction(7), batch = True) errObs = ErrorTrajectory(alg) alg.run(stopping_criterion = FixedIterationsStoppingCriterion(1500)) errObs.get_trajectory(color='red', joined = True)
alg.model.plot((0, 1), (0, 1), shading = True)
Instead of running the backpropagation algorithm for a fixed number of iterations and susequently testing it against the labeled example used in the training phase, it is possible to use a different stopping criterion, namely those corresponding to the instances of class TrainErrorStoppingCriterion
. In this case, the learning algorithm is run until the inferred model has a performance on its training set not higher than an error threshold specified when instancing the stopping criterion. For instance, the following code guarantees that the binary XOR sample is learnt with error not higher than 0.01:
alg = BackpropagationAlgorithm(xor_sample, (2, 2, 1), learning_rate = .1, activations = SigmoidActivationFunction(10)) errObs = ErrorTrajectory(alg) from yaplf.utility import TrainErrorStoppingCriterion alg.run(stopping_criterion = TrainErrorStoppingCriterion(0.01)) errObs.get_trajectory(color='red', joined = True)
Moreover, the class TestErrorStoppingCriterion
has a behaviour similar to that of TrainErrorStoppingCriterion
, but in this case error is computed on another set of examples, specified during class instantiation.
Finally, it is worth mentioning that the architecture of a multilayer perceptron should not be fixed before the learning process start. Rather, it should be chosen among a set of possible candidates with reference to the learning process itself. The typical technique used in this case is called d-fold cross validation, and works as follows: for each candidate architecture, the available set of examples is partitioned in k subsets, k-1 of which are used for training while the remaining one is used for testing the inferred model (as in the previously mentioned TestErrorStoppingCriterion
class). The process is repeated k times, changing each time the subset using for testing purposes, and averaging the obtained results. At the end, the architecture scoring the lower average is selected. The function cross_validation
defined in package yaplf.utility
provides this particular functionality independently of the chosen model and learning algorithm: the only requirement is that the parameters affecting the model architecture need to be specified as named arguments in the corresponding learning algorithm constructor. For instance, the following code automatically selects the best sigmoid activation function in a set of four ones in order to learn the TC sample:
from yaplf.utility import cross_validation p = cross_validation(BackpropagationAlgorithm, tc_sample, (('activations', (SigmoidActivationFunction(2), SigmoidActivationFunction(3), SigmoidActivationFunction(5), SigmoidActivationFunction(10)) ), ), fixed_parameters = {'dimensions': (9, 2, 1)}, run_parameters = {'stopping_criterion': FixedIterationsStoppingCriterion(1000), 'learning_rate': 1}, num_folds = 4, verbose = True)
Errors: [0.030319163781913711, 0.060571552036313515, 0.029535618302241346, 0.4099711142423762] Minimum error in position 2 Selected parameters (SigmoidActivationFunction(5),)
p.test(tc_sample, verbose = True)
(0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to 0.0987184591155, label is (0.100000000000000,), error [ 1.64234704e-06] (0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.100000000000000) mapped to 0.995638751026, label is (0.900000000000000,), error [ 0.00914677] (0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000) mapped to 0.0987060693926, label is (0.100000000000000,), error [ 1.67425642e-06] (0.100000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000) mapped to 0.900933293696, label is (0.900000000000000,), error [ 8.71037123e-07] (0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to 0.0991757359563, label is (0.100000000000000,), error [ 6.79411214e-07] (0.100000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to 0.995353479277, label is (0.900000000000000,), error [ 0.00909229] (0.900000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000) mapped to 0.0987035440608, label is (0.100000000000000,), error [ 1.68079800e-06] (0.900000000000000, 0.100000000000000, 0.100000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.900000000000000, 0.100000000000000, 0.100000000000000) mapped to 0.899575874803, label is (0.900000000000000,), error [ 1.79882183e-07] MSE 0.002280723055 0.002280723055