Comparação Sistemática do Desempenho de Classificadores Binários Supervisionados
Matheus Avellar de Barros, Carlos Eduardo Pedreira, Laura de Oliveira Fernandes Moraes
Programa de Engenharia de Sistemas e Computação·2025
Classification algorithms, or classifiers, categorize data into distinct classes, and play a fundamental role in machine learning. Classifiers are used in various areas, from detecting spam and financial fraud, to recognizing objects in images (such as people, animals and gestures) and even detecting cancer cells. Because the classification task isn't trivial, or even deterministic, implementing heuristics to aid in the decision-making process for ambiguous cases is necessary – and that's the algorithms' function. However, faced with a plethora of options, pragmatic differences between the many classifiers are not always clear. In this dissertation, we show that it is possible to group certain supervised binary classifiers widely used in the literature (SVM, GBDT, kNN, RF and NB) according to their resistance to noise and/or unbalanced data, or to their running time. These results indicate that, even if there is a great deal of similarity in the general performance of well known algorithms, they have differences in execution that may have a significant impact depending on the dataset being used. Therefore, the results may assist in the choice for a more adequate classifier.