# Research areas

A learning algorithm for fuzzy sets processing data labeled with
their membership degrees has been proposed in
[Malchiodi
and
Pedrycz, 2013; Malchiodi, 2019a]
.
Such algorithm has been applied to axiom mining within semantic
Web
[Malchiodi
and
Tettamanzi, 2018]
and to negative examples
selection in bioinformatics
[Frasca
and
Malchiodi, 2017; Frasca
and
Malchiodi, 2016]
.
This approach has been extended in
[Cermenati et al., 2020]
to the simultaneous induction of
several fuzzy sets, and in
[Malchiodi
and
Zanaboni, 2019]
to shadowed sets.

in collaboration with
Prof. Zanaboni (Università degli Studi di Milano),
Prof. Pedrycz (University of Alberta)

Knowledge induced via machine learning techniques is often
encoded and stored in a distributed fashion withen models learnt
from data. Thus it might be difficult to give a qualitative
interpretation of the obtained results. Moreover, this typically
turns out in bandwidth and storage capacity issues when resources
are limited. A possible solution to these problems consists in
reducing the amount of space necessary in order to store the above
mentioned models after they have been trained. Some compression
techniques for neural networks obtained via deep learning is
currently under investigation within the research project

*Multicriteria Data Structures and Algorithms: from compressed to learned indexes, and beyond*, funded by the Italian Ministry of Education and Research under the PRIN initiative [Marinò et al., 2021] . Their implementation is described in [Marinò et al., 2021] .in collaboration with
Prof. Frasca (Università degli Studi di Milano)

Searching potential axioms within a set of formulas is a particularly
demanding problem from a computational viewpoint. The solution of
inducing such axioms starting from formulas labeled via a
precomputed fitness measure, obtained through processing of a
knowledge base from the semabtic Web field, has been studied
using learning algorithms for fuzzy sets
[Malchiodi
and
Tettamanzi, 2018]
and kernel-based regression
techniques
[Malchiodi et al., 2018]
. The dependency of
the problem on the used learning algorithm and on the dimensionality
reduction technique employed in order to encode axioms as numerical
vectors has been investigated in
[Malchiodi et al., 2020]
.

in collaboration with
Prof. Da Costa Pereira, Prof. Tettamanzi
(Université de la Côte d'Azur)

The application of supervised machine learning methods in
bioinformatics requires the selection among non-positively labeled
data of those representing reliable negative examples, that is
excluding entities on which no experiments have been conducted.
In
[Frasca
and
Malchiodi, 2017; Frasca
and
Malchiodi, 2016]
such negative selection problem has been tackled using a ranking
based on membership functions to fuzzy sets, while
[Frasca et al., 2017; Boldi et al., 2018]
propose an encoding for the available data promoting the negative
selection process in the problem of protein functions prediction.
Finally, a similar procedure has been proposed in
[Frasca et al., 2019]
for the problem of
gene prioritization.

in collaboration with
Prof. Frasca (Università degli Studi di Milano)

[Casiraghi et al., 2020]
describes
the application of machine learning techniques to the problem of
predicting the severity of COVID-19 in patients entering EDs.

in collaboration with
Prof. Valentini (Università degli Studi di Milano)
Prof. Casiraghi (Università degli Studi di Milano)
Prof. Frasca (Università degli Studi di Milano)

Some machine learning and statistical data analysis techniques have been
adapted in order to deal with problems in the veterinary and forensic
fields. In particular,
[Galizzi et al., 2021]
describes
the application of statistical methods in order to classify the
incidence of cardiovascular factors in the death of dogs undergoing
specific therapy.

in collaboration with
Prof. Zanaboni (Università degli Studi di Milano)

Machine learning models have as starting point a labeled sample
whose elements are processed homogeneously (that is, each element
has the same importance). In
[Malchiodi, 2008]
the general model of data quality-based learning was proposed.
In this model it is possible to associate each of the available
data items a numerical quantification of its importance with
reference to the remaining data. This model was applied to the
problem of classification through Support Vector Machines, both in
its linear
[Apolloni
and
Malchiodi, 2006]
and kernel-based
version
[Apolloni et al., 2007]
. A first analysis
of the performance for these applications has been undertaken both
theoretically
[Apolloni et al., 2007]
and
experimentally
[Malchiodi, 2009]
. Some
preliminary applications in the bioinformatics field is described
in
[Malchiodi et al., 2010]
. A similar approach has
also been applied to the regression problem in
[Apolloni et al., 2010; Malchiodi et al., 2009; Apolloni et al., 2005]
and to unbalanced learning in
[Malchiodi, 2013b]
.

Several types of learning algorithms have been designed, implemented
and analyzed. In particular,
[Malchiodi
and
Legnani, 2014]
proposes an improvement of the support vector-based classification
algorithms dealing both with partially labeled data and with
uncertain labels, while
[Malchiodi
and
Pedrycz, 2013]
introduces a learning algorithm for membership functions of fuzzy
sets. The latter approach has been extended in
[Malchiodi
and
Zanaboni, 2019]
to shadowed sets.

Concerning tertiary-level teaching, two publications have been
produced: a manual for a software for automatic computations and
a exercise textbook on operating systems
[Malchiodi, 2007; Malchiodi, 2015]
.
Within a wider audience,
[Monga et al., 2017]
is centered around Alan Turing, and
[Malchiodi, 2019a]
describes possible future
evolutions of fuzzy-based technologies.

in collaboration with
laboratorio ALaDDIn (Università degli Studi di Milano)

The algomotorial approach has been introduced in
[Bellettini et al., 2014]
with the aim of teaching
computing as the science studying the automatic elaboration of
information, in contrast with the trend of tying computing to
the working knowledge of specific technological tools
[Lonati et al., 2015; Bellettini et al., 2014]
.
The proposed approach has been evaluated in the realm of teaching
habilitation
[Bellettini et al., 2015]
, with special
focus to a constructivist perspective
[Bellettini et al., 2018; Bellettini et al., 2018]
.
Furthermore, the relation within teaching and computational
thinking competitions was studied in
[Lonati et al., 2017]
, evaluating the impact
of the presentation of questions on the latter efficacy
[Lonati et al., 2017]
.

in collaboration with
laboratorio ALaDDIn (Università degli Studi di Milano)

Starting from an analysis of computing education in Italian
schools
[Bellettini et al., 2014]
and a criticism
to the common identification of computer programming with
the use of a language in order to encode an algorithm
[Lonati et al., 2015]
, the field of computer
programming teaching has been studied from the viewpoint of
its introduction via projects and specific tools
[Bulgheroni
and
Malchiodi, 2009; Paterson et al., 2015]
,
of an interdisciplinary approach with musical subjects
[Ludovico et al., 2017; Baraté et al., 2017; Baratè et al., 2017]
, also considering advanced
aspects of the discipline
[Lonati et al., 2016; Lonati et al., 2017]
.
Finally,
[Monga et al., 2018; Lodi et al., 2019]
analyses a constructionist approach
to computer programming.

in collaboration with
laboratorio ALaDDIn (Università degli Studi di Milano)

Within the organization of non-competitive challenges on
computational thinking at the national level
[Lissoni et al., 2012; Lissoni et al., 2013; Lissoni et al., 2014; Lissoni et al., 2015]
and the evaluation
of their results
[Bellettini et al., 2015; Lonati et al., 2017]
, an analysis of the possibility
to exploit this tools as a resource for learning in
primary and secondary schools
has been carried out
[Lonati et al., 2017; Calcagni et al., 2017; Morpurgo et al., 2018]
.

in collaboration with
laboratorio ALaDDIn (Università degli Studi di Milano)

The algomotorial approach introduced in
[Bellettini et al., 2014; Bellettini et al., 2014]
has been applied to the introduction of core concepts of
computing, such as information representation
[Bellettini et al., 2012; Bellettini et al., 2013; Baraté et al., 2017]
, basics of computer programming
[Baratè et al., 2017]
, as well as recursive
and greedy strategies
[Lonati et al., 2016; Lonati et al., 2017; Lonati et al., 2017]
.

in collaboration with
laboratorio ALaDDIn (Università degli Studi di Milano)

The granular computing model, giving information a granular meaning
and allowing its analysis and its processing at different abstraction
levels, is described in
[Apolloni et al., 2008]
, where its
links with machine learning models are analysed. The effects of a
fusion of these two models have been studied within the general field
of regression, proposing new algorithms based on Support Vector
Machines
[Apolloni et al., 2008; Apolloni et al., 2006]
or on local search techniques
[Apolloni et al., 2005]
.

Bootstrap techniques are based on data resampling models with the aim
of approximating the distribution of a population. A specialization
of this kind of techniques, intially proposed in
[Apolloni et al., 2006]
and subsequently refined in
[Apolloni et al., 2009; Apolloni et al., 2007]
,
gives as output confidence regions for regression curves, avoiding
usual assumptions on the distribution of measurement drifts. The use
of this technique to solve linear and nonlinear regression problems
is shown in
[Apolloni et al., 2008]
, while
[Apolloni et al., 2007]
describes some applications to
the medical field.

The task of integrating under a unique theoretical model istances
of inference problems from statistics (point and interval estimation
of distribution parameters) and computer science (estimation of
approximation error in machine learning) is tackled in
[Apolloni et al., 2006; Apolloni et al., 2005; Apolloni et al., 2002; Apolloni et al., 2002; Apolloni
and
Malchiodi, 2001; Malchiodi, 2000]
,
building on previously obtained results on sample complexity
[Apolloni
and
Malchiodi, 2001]
and describing the Algorithmic
Inference model. This model was used with the aim of estimating the
risk in classification problems based on Support Vector Machines
[Apolloni et al., 2007; Apolloni et al., 2005; Apolloni
and
Malchiodi, 2002; Apolloni
and
Malchiodi, 2001]
,
learning confidence regions for regression lines avoiding the
typical assumption requiring a Gaussian drift distribution
[Apolloni et al., 2005; Apolloni et al., 2002]
, and learning confidence regions
for the risk function of re-occurrence distribution times in
particular cancer pathologies
[Apolloni et al., 2007; Apolloni et al., 2005; Apolloni et al., 2002]
.

Systems for scientific computation can be used to run simulations
and to analyze mathematical problems from an interactive and
incremental point of view; To this effect, such systems offer
interesting cues in order to design educational activities
[Bulgheroni
and
Malchiodi, 2009; Malchiodi, 2008a]
.
A commercial version of this kind of systems, thoroughly described
in
[Malchiodi, 2007]
, has been extended so as to
solve purely computational aspects associated to information
encoding
[Malchiodi, 2006c]
, remote procedure
invocation
[Malchiodi, 2006b; Malchiodi, 2006]
, production of scientific
documentation
[Malchiodi, 2011]
, and solutions
to optimization
[Malchiodi, 2006a]
and machine
learning problems based on Support Vectors
[Malchiodi et al., 2009; Malchiodi et al., 2009]
, as well as to perform
software validation techniques
[Malchiodi, 2013a]
. The related code has
been used in order to build up the simulations in
[Apolloni et al., 2007; Apolloni
and
Malchiodi, 2006]
.
Moreover,
[Malchiodi, 2010a]
describes a library
handling machine learning problems within an open source system
for scientific computation.

Hybrid learning systems are typically organized coupling
sub-symbolic modules (typically based on the neural networks
paradigm) with symbolic ones (described in terms of logic circuits).
Such a system, having as inputs a set of features describing the
available data and extracting their boolean independent components,
is described in
[Apolloni et al., 2005; Apolloni et al., 2004]
. These components, interpreted
as truth values, are used in order to infer logical formulas
describing in a symbolic ways the relations among original input
data
[Apolloni et al., 2006; Apolloni et al., 2003; Apolloni et al., 2002; Apolloni et al., 2000]
. This system is applied in
[Apolloni et al., 2004]
to the problem of emotion
recognition on the basis of voice signals, while
[Apolloni et al., 2004; Apolloni et al., 2004; Apolloni et al., 2003; Apolloni et al., 2003; Apolloni et al., 2003]
describes an applications to
the monitoring of awareness in car driving in function of biosignals,
within the research project IST-2000-26091 ORESTEIA (mOdular
hybRid artEfactS wiTh adaptivE functIonAlity, funded between 2001
and 2003 by the EC within the fifth framework programme, under the
IST-FET initiative). Moreover,
[Apolloni
and
Malchiodi, 2006; Apolloni et al., 2005]
study two hybrid systems
obtained through the integration of a fuzzy system for the
measurement of quality in available data respectively with a
linear Support Vector classifier and with a linear regression
model.

Whithin computational learning theory, the structural risk
minimization principle investigates on the problem of balancing
the complexity of a model with its accuracy in describing
experimental data. This principle has been applied to classifiers
based on logic expressions built in terms of disjuctive and
conjunctive boolean normal forms. A simplification algorithm for
such forms was developed in
[Apolloni et al., 2006; Apolloni et al., 2005; Apolloni et al., 2003; Apolloni et al., 2002; Apolloni et al., 2002]
,
focusing on the stochastic optimization of parameters in fuzzy
sets describing the above mentioned forms.

Within this subject the activities have been focused on the problem
of modeling conflicting situations through an approach alternative
to that of classical game theory. In particular, these conflicts
were modeled in terms of approximating the solution to an NP-hard
problem
[Apolloni et al., 2006; Apolloni et al., 2003; Apolloni et al., 2002; Apolloni et al., 2002]
, applying
the Algorithmic Inference model in order to assign limited
computational resources to two players, subsequently extending
this technique to team games
[Apolloni et al., 2006]
.
This model is applied in
[Apolloni et al., 2007; Apolloni et al., 2005]
to the biologic field,
while
[Apolloni et al., 2010]
uses this approach with
the aim of correctly dimensioning the running time for learning
algorithms based on local error minimization.

The research project ORESTEIA (mOdular hybRid artEfactS wiTh
adaptivE functIonAlity, funded between 2001 and 2003 by the EC
within the fifth framework programme, under the IST-FET initiative)
was grounded on the design, implementation and analysis of
intelligent systems for pervasive and ubiquitous computing.
These fields are characterized by highly specialized computers
devoted to execute specific tasks. These special computers can
be produced so as to significantly reduce their size and cost,
consequently being able to immerse them inside an environment.
Focusing specifically on the awareness detection problem
[Kasderidis et al., 2003]
, a prototype for
the detection of driving awareness on the basis of biosignals
[Apolloni et al., 2004; Apolloni et al., 2004; Apolloni et al., 2003; Apolloni et al., 2003; Apolloni et al., 2003]
have been developed.

Within the progress of reserach project PHYSTA (Principled Hybrid
Systems: Theory and Applications, funded between 1998 and 2000 by
the EC within the fourth framework programme, within the TMR
initiative), the Algorithmic Inference model described in
[Apolloni et al., 2006; Malchiodi, 2000]
was applied to the problem of automatic classification of emotions
on the basis of vocal signals
[Apolloni et al., 2004; Apolloni et al., 2002]
. The obtained results were presented
at an international school on computational learning within the
same research project.

The availability of hardware circuits able to directly process
information with the aim of synthesizing them through estimators
allow a remarkable shortening in running times. Their use imply a
set of constraints basically linked to the architecture of the
circuits themselves. The inference-among-gossips, developed in
[Malchiodi, 1996]
, has been applied within
this scope with the aim of obtaining a family of estimators for
bernoulli populations directly implementable on pRAM boards
[Apolloni et al., 1997]
. The same model has
been applied in
[Apolloni et al., 2013]
to the study
of information exchange in social networks.