A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting

Silva, Sara; Vanneschi, Leonardo; Cabral, Ana I.R.; Vasconcelos, Maria J.

doi:https://doi.org/10.1016/j.swevo.2017.11.003

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10362/151417

Registo completo

Campo DC	Valor	Idioma
dc.contributor.author	Silva, Sara	-
dc.contributor.author	Vanneschi, Leonardo	-
dc.contributor.author	Cabral, Ana I.R.	-
dc.contributor.author	Vasconcelos, Maria J.	-
dc.date.accessioned	2023-03-30T22:09:44Z	-
dc.date.available	2024-01-27T01:32:02Z	-
dc.date.issued	2018-04-01	-
dc.identifier.issn	2210-6502	-
dc.identifier.other	PURE: 3788203	-
dc.identifier.other	PURE UUID: c66ef9e1-4d5b-4fd9-bb74-5bf996626a43	-
dc.identifier.other	Scopus: 85035221183	-
dc.identifier.other	WOS: 000428826000021	-
dc.identifier.other	ORCID: /0000-0003-4732-3328/work/151426693	-
dc.identifier.uri	http://hdl.handle.net/10362/151417	-
dc.description	Silva, S., Vanneschi, L., Cabral, A. I. R., & Vasconcelos, M. J. (2018). A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting. Swarm and Evolutionary Computation, 39(April), 323-338. DOI: 10.1016/j.swevo.2017.11.003	-
dc.description.abstract	Data gathered in the real world normally contains noise, either stemming from inaccurate experimental measurements or introduced by human errors. Our work deals with classification data where the attribute values were accurately measured, but the categories may have been mislabeled by the human in several sample points, resulting in unreliable training data. Genetic Programming (GP) compares favorably with the Classification and Regression Trees (CART) method, but it is still highly affected by these errors. Despite consistently achieving high accuracy in both training and test sets, many classification errors are found in a later validation phase, revealing a previously hidden overfitting to the erroneous data. Furthermore, the evolved models frequently output raw values that are far from the expected range. To improve the behavior of the evolved models, we extend the original training set with additional sample points where the class label is unknown, and devise a simple way for GP to use this additional information and learn in a semi-supervised manner. The results are surprisingly good. In the presence of the exact same mislabeling errors, the additional unlabeled data allowed GP to evolve models that achieved high accuracy also in the validation phase. This is a brand new approach to semi-supervised learning that opens an array of possibilities for making the most of the abundance of unlabeled data available today, in a simple and inexpensive way.	en
dc.format.extent	16	-
dc.language.iso	eng	-
dc.rights	openAccess	pt_PT
dc.subject	Classification	-
dc.subject	Data errors	-
dc.subject	Genetic Programming	-
dc.subject	Hidden overfitting	-
dc.subject	Noisy labels	-
dc.subject	Semi-supervised learning	-
dc.subject	Computer Science(all)	-
dc.subject	Mathematics(all)	-
dc.title	A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting	-
dc.type	article	-
degois.publication.firstPage	323	-
degois.publication.issue	April	-
degois.publication.lastPage	338	-
degois.publication.title	Swarm and Evolutionary Computation	-
degois.publication.volume	39	-
dc.peerreviewed	yes	-
dc.identifier.doi	https://doi.org/10.1016/j.swevo.2017.11.003	-
dc.description.version	authorsversion	-
dc.description.version	published	-
dc.contributor.institution	NOVA Information Management School (NOVA IMS)	-
dc.contributor.institution	Information Management Research Center (MagIC) - NOVA Information Management School	-
Aparece nas colecções:	NIMS: MagIC - Artigos em revista internacional com arbitragem científica (Peer-Review articles in international journals)

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
semi_supervised_Genetic_Programming_method_noisy_labels_hidden_overfitting.pdf		15,02 MB	Adobe PDF	Ver/Abrir

Mostrar registo em formato simples Dê a sua opinião sobre este registo.