Overlapping Classes in Imbalanced Datasets
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Big data has become easily available, but there is a need to improve the usefulness
of these data, especially when we have an imbalanced dataset and overlapping data
points in two or more classes. Machine-learning algorithms have improved in recent
years, and many algorithms have been introduced that tackle the issues in data that
su er from imbalanced classes and have overlap in some features.
This will be a problem used to train a classi er in deciding where each data point
belongs. Such a situation often occurs when the number of examples that we are
interested in is much less in number than the other classes. We can see problems
of this kind in many elds, like for example, fraud detection, cancer diagnosis, oil
mining, network intrusion, and many others. In this thesis, we will discuss the cases
of datasets that are imbalanced and overlapping in some data points. The main
problem to be dealt with is how to make a better judgment regarding the gray area
between the minority class and the majority class and the overlap between the two.
We will provide characteristics of the imbalanced dataset scenarios in the classi cation
phase and then try to provide a better solution. Then, we will discuss the cost of the
learning process together with algorithms and techniques for solving these issues.