- #1
- 7,020
- 10,603
- TL;DR Summary
- To each feature in a set we apply a normalized weight; to each weight we assign an entropy value and then we use a cutoff value to decide which features remain. I am trying to get a better "feel" for the choice of cutoff value
Sorry for being fuzzy here, I started reading a small paper and I am a bit confused. These are some loose notes without sources or refs.
Say we start with a collection F of features we want to trim into a smaller set F' of features through information gain and entropy (where we are using the formula ## -P_{ai} log_2(Pa_i) ##).
We start by assigning a normalized weight N_F (so that all weights are between 0 and 1 )to each feature. I am not sure on the actual assignment rule. Then we assign and entropy value E_F to each N_F.
And ** Here ** is the key to my post, The cutoff point for a feature is defined as ## 2^{E_F} ## . What does this measure? It seems it has to see with inverting the log value, but how does this relate to information gain as a cutoff point?
Say we start with a collection F of features we want to trim into a smaller set F' of features through information gain and entropy (where we are using the formula ## -P_{ai} log_2(Pa_i) ##).
We start by assigning a normalized weight N_F (so that all weights are between 0 and 1 )to each feature. I am not sure on the actual assignment rule. Then we assign and entropy value E_F to each N_F.
And ** Here ** is the key to my post, The cutoff point for a feature is defined as ## 2^{E_F} ## . What does this measure? It seems it has to see with inverting the log value, but how does this relate to information gain as a cutoff point?
Last edited: