170 views
Title: Variable selection in highly imbalanced datasets Abstract: Variable selection is a fundamental field in machine learning. It consists of identifying and choosing the most relevant variables from a dataset, with the aim of improving model performance. A particularly prominent approach is Recursive Feature Elimination (RFE), widely used in practice due to its effectiveness and flexibility. Another significant challenge in machine learning is the handling of imbalanced datasets, where one or more classes are underrepresented compared to others. This imbalance can lead to predictive models being biased towards the majority classes, resulting in low effectiveness for detecting the minority classes. Traditional model evaluation and training methods are not always suitable in these scenarios, making it necessary to develop specific techniques that can effectively address class imbalance. The goal of this dissertation is to comprehensively address the challenges of variable selection and handling imbalanced datasets. To this end, a modification of the RFE method is proposed to favour the detection of the minority class, integrating the Permutation Importance technique to choose importance measures that are appropriate for this problem. We will demonstrate how this version improves the performance of the original algorithm on artificial datasets and study its application on real datasets based on the search for variable stars.