# Research areas

A learning algorithm for fuzzy sets processing data labeled with their membership degrees has been proposed in [Malchiodi and Pedrycz, 2013; Malchiodi, 2019a] . Such algorithm has been applied to axiom mining within semantic Web [Malchiodi and Tettamanzi, 2018] and to negative examples selection in bioinformatics [Frasca and Malchiodi, 2017; Frasca and Malchiodi, 2016] . This approach has been extended in [Cermenati et al., 2020] to the simultaneous induction of several fuzzy sets, and in [Malchiodi and Zanaboni, 2019] to shadowed sets.
in collaboration with Prof. Zanaboni (Università degli Studi di Milano), Prof. Pedrycz (University of Alberta)
Knowledge induced via machine learning techniques is often encoded and stored in a distributed fashion withen models learnt from data. Thus it might be difficult to give a qualitative interpretation of the obtained results. Moreover, this typically turns out in bandwidth and storage capacity issues when resources are limited. A possible solution to these problems consists in reducing the amount of space necessary in order to store the above mentioned models after they have been trained. Some compression techniques for neural networks obtained via deep learning is currently under investigation within the research project Multicriteria Data Structures and Algorithms: from compressed to learned indexes, and beyond, funded by the Italian Ministry of Education and Research under the PRIN initiative [Marinò et al., 2021] . Their implementation is described in [Marinò et al., 2021] .
in collaboration with Prof. Frasca (Università degli Studi di Milano)
Searching potential axioms within a set of formulas is a particularly demanding problem from a computational viewpoint. The solution of inducing such axioms starting from formulas labeled via a precomputed fitness measure, obtained through processing of a knowledge base from the semabtic Web field, has been studied using learning algorithms for fuzzy sets [Malchiodi and Tettamanzi, 2018] and kernel-based regression techniques [Malchiodi et al., 2018] . The dependency of the problem on the used learning algorithm and on the dimensionality reduction technique employed in order to encode axioms as numerical vectors has been investigated in [Malchiodi et al., 2020] .
in collaboration with Prof. Da Costa Pereira, Prof. Tettamanzi (Université de la Côte d'Azur)
The application of supervised machine learning methods in bioinformatics requires the selection among non-positively labeled data of those representing reliable negative examples, that is excluding entities on which no experiments have been conducted. In [Frasca and Malchiodi, 2017; Frasca and Malchiodi, 2016] such negative selection problem has been tackled using a ranking based on membership functions to fuzzy sets, while [Frasca et al., 2017; Boldi et al., 2018] propose an encoding for the available data promoting the negative selection process in the problem of protein functions prediction. Finally, a similar procedure has been proposed in [Frasca et al., 2019] for the problem of gene prioritization.
in collaboration with Prof. Frasca (Università degli Studi di Milano)
[Casiraghi et al., 2020] and [Esposito et al., 2021] describe the application of machine learning techniques to the problem of predicting the severity of COVID-19 in patients entering EDs.
in collaboration with Prof. Valentini (Università degli Studi di Milano) Prof. Casiraghi (Università degli Studi di Milano) Prof. Frasca (Università degli Studi di Milano)
Some machine learning and statistical data analysis techniques have been adapted in order to deal with problems in the veterinary and forensic fields. In particular, [Galizzi et al., 2021] and [Bagardi et al., 2021] describe the application of statistical methods in order to classify the incidence of cardiovascular factors in the death of dogs undergoing specific therapy, while [Casali et al., 2021] discusses a pilot study on the application of classification algorithms to predict the type of vehicle involved in a pedestrian hit.
in collaboration with Prof. Zanaboni (Università degli Studi di Milano)
Machine learning models have as starting point a labeled sample whose elements are processed homogeneously (that is, each element has the same importance). In [Malchiodi, 2008] the general model of data quality-based learning was proposed. In this model it is possible to associate each of the available data items a numerical quantification of its importance with reference to the remaining data. This model was applied to the problem of classification through Support Vector Machines, both in its linear [Apolloni and Malchiodi, 2006] and kernel-based version [Apolloni et al., 2007] . A first analysis of the performance for these applications has been undertaken both theoretically [Apolloni et al., 2007] and experimentally [Malchiodi, 2009] . Some preliminary applications in the bioinformatics field is described in [Malchiodi et al., 2010] . A similar approach has also been applied to the regression problem in [Apolloni et al., 2010; Malchiodi et al., 2009; Apolloni et al., 2005] and to unbalanced learning in [Malchiodi, 2013b] .
Several types of learning algorithms have been designed, implemented and analyzed. In particular, [Malchiodi and Legnani, 2014] proposes an improvement of the support vector-based classification algorithms dealing both with partially labeled data and with uncertain labels, while [Malchiodi and Pedrycz, 2013] introduces a learning algorithm for membership functions of fuzzy sets. The latter approach has been extended in [Malchiodi and Zanaboni, 2019] to shadowed sets.
Concerning tertiary-level teaching, two publications have been produced: a manual for a software for automatic computations and a exercise textbook on operating systems [Malchiodi, 2007; Malchiodi, 2015] . Within a wider audience, [Monga et al., 2017] is centered around Alan Turing, and [Malchiodi, 2019a] describes possible future evolutions of fuzzy-based technologies.
in collaboration with laboratorio ALaDDIn (Università degli Studi di Milano)
The algomotorial approach has been introduced in [Bellettini et al., 2014] with the aim of teaching computing as the science studying the automatic elaboration of information, in contrast with the trend of tying computing to the working knowledge of specific technological tools [Lonati et al., 2015; Bellettini et al., 2014] . The proposed approach has been evaluated in the realm of teaching habilitation [Bellettini et al., 2015] , with special focus to a constructivist perspective [Bellettini et al., 2018; Bellettini et al., 2018] . Furthermore, the relation within teaching and computational thinking competitions was studied in [Lonati et al., 2017] , evaluating the impact of the presentation of questions on the latter efficacy [Lonati et al., 2017] .
in collaboration with laboratorio ALaDDIn (Università degli Studi di Milano)
Starting from an analysis of computing education in Italian schools [Bellettini et al., 2014] and a criticism to the common identification of computer programming with the use of a language in order to encode an algorithm [Lonati et al., 2015] , the field of computer programming teaching has been studied from the viewpoint of its introduction via projects and specific tools [Bulgheroni and Malchiodi, 2009; Paterson et al., 2015] , of an interdisciplinary approach with musical subjects [Ludovico et al., 2017; Baraté et al., 2017; Baratè et al., 2017] , also considering advanced aspects of the discipline [Lonati et al., 2016; Lonati et al., 2017] . Finally, [Monga et al., 2018; Lodi et al., 2019] analyses a constructionist approach to computer programming.
in collaboration with laboratorio ALaDDIn (Università degli Studi di Milano)
Within the organization of non-competitive challenges on computational thinking at the national level [Lissoni et al., 2012; Lissoni et al., 2013; Lissoni et al., 2014; Lissoni et al., 2015] and the evaluation of their results [Bellettini et al., 2015; Lonati et al., 2017] , an analysis of the possibility to exploit this tools as a resource for learning in primary and secondary schools has been carried out [Lonati et al., 2017; Calcagni et al., 2017; Morpurgo et al., 2018] .
in collaboration with laboratorio ALaDDIn (Università degli Studi di Milano)
The algomotorial approach introduced in [Bellettini et al., 2014; Bellettini et al., 2014] has been applied to the introduction of core concepts of computing, such as information representation [Bellettini et al., 2012; Bellettini et al., 2013; Baraté et al., 2017] , basics of computer programming [Baratè et al., 2017] , as well as recursive and greedy strategies [Lonati et al., 2016; Lonati et al., 2017; Lonati et al., 2017] .
in collaboration with laboratorio ALaDDIn (Università degli Studi di Milano)
The granular computing model, giving information a granular meaning and allowing its analysis and its processing at different abstraction levels, is described in [Apolloni et al., 2008] , where its links with machine learning models are analysed. The effects of a fusion of these two models have been studied within the general field of regression, proposing new algorithms based on Support Vector Machines [Apolloni et al., 2008; Apolloni et al., 2006] or on local search techniques [Apolloni et al., 2005] .
Bootstrap techniques are based on data resampling models with the aim of approximating the distribution of a population. A specialization of this kind of techniques, intially proposed in [Apolloni et al., 2006] and subsequently refined in [Apolloni et al., 2009; Apolloni et al., 2007] , gives as output confidence regions for regression curves, avoiding usual assumptions on the distribution of measurement drifts. The use of this technique to solve linear and nonlinear regression problems is shown in [Apolloni et al., 2008] , while [Apolloni et al., 2007] describes some applications to the medical field.
The task of integrating under a unique theoretical model istances of inference problems from statistics (point and interval estimation of distribution parameters) and computer science (estimation of approximation error in machine learning) is tackled in [Apolloni et al., 2006; Apolloni et al., 2005; Apolloni et al., 2002; Apolloni et al., 2002; Apolloni and Malchiodi, 2001; Malchiodi, 2000] , building on previously obtained results on sample complexity [Apolloni and Malchiodi, 2001] and describing the Algorithmic Inference model. This model was used with the aim of estimating the risk in classification problems based on Support Vector Machines [Apolloni et al., 2007; Apolloni et al., 2005; Apolloni and Malchiodi, 2002; Apolloni and Malchiodi, 2001] , learning confidence regions for regression lines avoiding the typical assumption requiring a Gaussian drift distribution [Apolloni et al., 2005; Apolloni et al., 2002] , and learning confidence regions for the risk function of re-occurrence distribution times in particular cancer pathologies [Apolloni et al., 2007; Apolloni et al., 2005; Apolloni et al., 2002] .
Systems for scientific computation can be used to run simulations and to analyze mathematical problems from an interactive and incremental point of view; To this effect, such systems offer interesting cues in order to design educational activities [Bulgheroni and Malchiodi, 2009; Malchiodi, 2008a] . A commercial version of this kind of systems, thoroughly described in [Malchiodi, 2007] , has been extended so as to solve purely computational aspects associated to information encoding [Malchiodi, 2006c] , remote procedure invocation [Malchiodi, 2006b; Malchiodi, 2006] , production of scientific documentation [Malchiodi, 2011] , and solutions to optimization [Malchiodi, 2006a] and machine learning problems based on Support Vectors [Malchiodi et al., 2009; Malchiodi et al., 2009] , as well as to perform software validation techniques [Malchiodi, 2013a] . The related code has been used in order to build up the simulations in [Apolloni et al., 2007; Apolloni and Malchiodi, 2006] . Moreover, [Malchiodi, 2010a] describes a library handling machine learning problems within an open source system for scientific computation.
Hybrid learning systems are typically organized coupling sub-symbolic modules (typically based on the neural networks paradigm) with symbolic ones (described in terms of logic circuits). Such a system, having as inputs a set of features describing the available data and extracting their boolean independent components, is described in [Apolloni et al., 2005; Apolloni et al., 2004] . These components, interpreted as truth values, are used in order to infer logical formulas describing in a symbolic ways the relations among original input data [Apolloni et al., 2006; Apolloni et al., 2003; Apolloni et al., 2002; Apolloni et al., 2000] . This system is applied in [Apolloni et al., 2004] to the problem of emotion recognition on the basis of voice signals, while [Apolloni et al., 2004; Apolloni et al., 2004; Apolloni et al., 2003; Apolloni et al., 2003; Apolloni et al., 2003] describes an applications to the monitoring of awareness in car driving in function of biosignals, within the research project IST-2000-26091 ORESTEIA (mOdular hybRid artEfactS wiTh adaptivE functIonAlity, funded between 2001 and 2003 by the EC within the fifth framework programme, under the IST-FET initiative). Moreover, [Apolloni and Malchiodi, 2006; Apolloni et al., 2005] study two hybrid systems obtained through the integration of a fuzzy system for the measurement of quality in available data respectively with a linear Support Vector classifier and with a linear regression model.
Whithin computational learning theory, the structural risk minimization principle investigates on the problem of balancing the complexity of a model with its accuracy in describing experimental data. This principle has been applied to classifiers based on logic expressions built in terms of disjuctive and conjunctive boolean normal forms. A simplification algorithm for such forms was developed in [Apolloni et al., 2006; Apolloni et al., 2005; Apolloni et al., 2003; Apolloni et al., 2002; Apolloni et al., 2002] , focusing on the stochastic optimization of parameters in fuzzy sets describing the above mentioned forms.
Within this subject the activities have been focused on the problem of modeling conflicting situations through an approach alternative to that of classical game theory. In particular, these conflicts were modeled in terms of approximating the solution to an NP-hard problem [Apolloni et al., 2006; Apolloni et al., 2003; Apolloni et al., 2002; Apolloni et al., 2002] , applying the Algorithmic Inference model in order to assign limited computational resources to two players, subsequently extending this technique to team games [Apolloni et al., 2006] . This model is applied in [Apolloni et al., 2007; Apolloni et al., 2005] to the biologic field, while [Apolloni et al., 2010] uses this approach with the aim of correctly dimensioning the running time for learning algorithms based on local error minimization.
The research project ORESTEIA (mOdular hybRid artEfactS wiTh adaptivE functIonAlity, funded between 2001 and 2003 by the EC within the fifth framework programme, under the IST-FET initiative) was grounded on the design, implementation and analysis of intelligent systems for pervasive and ubiquitous computing. These fields are characterized by highly specialized computers devoted to execute specific tasks. These special computers can be produced so as to significantly reduce their size and cost, consequently being able to immerse them inside an environment. Focusing specifically on the awareness detection problem [Kasderidis et al., 2003] , a prototype for the detection of driving awareness on the basis of biosignals [Apolloni et al., 2004; Apolloni et al., 2004; Apolloni et al., 2003; Apolloni et al., 2003; Apolloni et al., 2003] have been developed.
Within the progress of reserach project PHYSTA (Principled Hybrid Systems: Theory and Applications, funded between 1998 and 2000 by the EC within the fourth framework programme, within the TMR initiative), the Algorithmic Inference model described in [Apolloni et al., 2006; Malchiodi, 2000] was applied to the problem of automatic classification of emotions on the basis of vocal signals [Apolloni et al., 2004; Apolloni et al., 2002] . The obtained results were presented at an international school on computational learning within the same research project.
The availability of hardware circuits able to directly process information with the aim of synthesizing them through estimators allow a remarkable shortening in running times. Their use imply a set of constraints basically linked to the architecture of the circuits themselves. The inference-among-gossips, developed in [Malchiodi, 1996] , has been applied within this scope with the aim of obtaining a family of estimators for bernoulli populations directly implementable on pRAM boards [Apolloni et al., 1997] . The same model has been applied in [Apolloni et al., 2013] to the study of information exchange in social networks.