Features Maximization Metric
Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.
- GitHub repository : https://github.com/cognitivefactory/features-maximization-metric/tree/1.0.0
- Main documentation : https://cognitivefactory.github.io/features-maximization-metric/
- Pypi distribution : https://pypi.org/project/cognitivefactory-features-maximization-metric/1.0.0/
Quick description
Features Maximization (FMC
) is a features selection method described in Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7
.
This metric is computed by applying the following steps:
-
Compute the Features F-Measure metric (based on Features Recall and Features Predominance metrics).
(a) The Features Recall
FR[f][c]
for a given classc
and a given featuref
is the ratio between
the sum of the vectors weights of the featuref
for data in classc
and the sum of all vectors weights of featuref
for all data.
It answers the question: "Can the featuref
distinguish the classc
from other classesc'
?"(b) The Features Predominance
FP[f][c]
for a given classc
and a given featuref
is the ratio between
the sum of the vectors weights of the featuref
for data in classc
and the sum of all vectors weights of all featuref'
for data in classc
.
It answers the question: "Can the featuref
better identify the classc
than the other featuresf'
?"(c) The Features F-Measure
FM[f][c]
for a given classc
and a given featuref
is
the harmonic mean of the Features Recall (a) and the Features Predominance (c).
It answers the question: "How much information does the featuref
contain about the classc
?" -
Compute the Features Selection (based on F-Measure Overall Average comparison).
(d) The F-Measure Overall Average is the average of Features F-Measure (c) for all classes
c
and for all featuresf
.
It answers the question: "What are the mean of information contained by features in all classes ?"(e) A feature
f
is Selected if and only if it exist at least one classc
for which the Features F-Measure (c)FM[f][c]
is bigger than the F-Measure Overall Average (d).
It answers the question: "What are the features which contain more information than the mean of information in the dataset ?"(f) A Feature
f
is Deleted if and only if the Features F-Measure (c)FM[f][c]
is always lower than the F-Measure Overall Average (d) for each classc
.
It answers the question: "What are the features which do not contain more information than the mean of information in the dataset ?" -
Compute the Features Contrast and Features Activation (based on F-Measure Marginal Averages comparison).
(g) The F-Measure Marginal Averages for a given feature
f
is the average of Features F-Measure (c) for all classesc
and for the given featuref
.
It answers the question: "What are the mean of information contained by the featuref
in all classes ?"(h) The Features Contrast
FC[f][c]
for a given classc
and a given selected featuref
is the ratio between
the Features F-Measure (c)FM[f][c]
and the F-Measure Marginal Averages (g) for selected feature f
put to the power of an Amplification Factor.
It answers the question: "How relevant is the featuref
to distinguish the classc
?"(i) A selected Feature
f
is Active for a given classc
if and only if the Features Contrast (h)FC[f][c]
is bigger than1.0
.
It answers the question : "For which classes a selected featuref
is relevant ?"
This metric is an efficient method to:
- identify relevant features of a dataset modelization;
- describe association between vectors features and data classes;
- increase contrast between data classes.
References
Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7
How to cite
Schild, E. (2023). cognitivefactory/features-maximization-metric. Zenodo. https://doi.org/10.5281/zenodo.7646382.