Information Gain and Mutual Information

Machine Learning

Information Gain and Mutual Information

Dan-k 2022. 7. 7. 13:18

Information Gain

- 데이터셋의 변화로 나타난 엔트로피(or surprise)의 감소량

- 일반적으로 decision tree를 만드는 로직으로 많이 이용되고 있고, 각각 변수의 information gain을 계산, information gain을 최대화 시킬수 있는 변수를 선택하는 것에 주로 쓰임

- 가장 좋은 dataset best split을 만들기 위해서도 사용됨

Skewed Probability Distribution (unsurprising): Low entropy.
Balanced Probability Distribution (surprising): High entropy.

- information이란 purity 즉 순도의 영향을 측정하는 것으로, 기존에 가지고 있던 엔트로피양에서 어떤 변화가 일어났을때 엔트로피값을 뺌으로서 산정됨

IG(S, a) = H(S) – H(S | a)

Where IG(S, a) is the information for the dataset S for the variable a for a random variable, H(S) is the entropy for the dataset before any change (described above) and H(S | a) is the conditional entropy for the dataset given the variable a.

# calculate the information gain
from math import log2
 
# calculate the entropy for the split in the dataset
def entropy(class0, class1):
	return -(class0 * log2(class0) + class1 * log2(class1))
 
# split of the main dataset
class0 = 13 / 20
class1 = 7 / 20
# calculate entropy before the change
s_entropy = entropy(class0, class1)
print('Dataset Entropy: %.3f bits' % s_entropy)
 
# split 1 (split via value1)
s1_class0 = 7 / 8
s1_class1 = 1 / 8
# calculate the entropy of the first group
s1_entropy = entropy(s1_class0, s1_class1)
print('Group1 Entropy: %.3f bits' % s1_entropy)
 
# split 2  (split via value2)
s2_class0 = 6 / 12
s2_class1 = 6 / 12
# calculate the entropy of the second group
s2_entropy = entropy(s2_class0, s2_class1)
print('Group2 Entropy: %.3f bits' % s2_entropy)
 
# calculate the information gain
gain = s_entropy - (8/20 * s1_entropy + 12/20 * s2_entropy)
print('Information Gain: %.3f bits' % gain)

Mutual Information

- 두개의 변수를 대상으로 산정되는 것으로, 한 변수의 변화로 인해 다른 변수의 불확실성의 감소량을 계산하는 것

- mutual information은 information gain와 매우 유사

- KL divergence로도 계산할 수 있음

> Kullback-Leibler 은 두 확률 분포간의 차이를 연산하는 척도이기 때문

- mutual information은 0이상의 값을 가짐.

I(X ; Y) = H(X) – H(X | Y)
I(X ; Y) = KL(p(X, Y) || p(X) * p(Y))

Information Gain과 Mutual Information 관계

- 둘은 사실 같은 개념

- Mutual Information and Information Gain are the same thing, although the context or usage of the measure often gives rise to the different names.

For example:

Effect of Transforms to a Dataset (decision trees): Information Gain.
Dependence Between Variables (feature selection): Mutual Information.

sklearn.feature_selection.feature_selection.mutual_info_classif(X, y, *)
# Estimate mutual information for a discrete target variable.

sklearn.feature_selection.feature_selection.mutual_info_regression(X, y, *)
# Estimate mutual information for a continuous target variable.

참조

https://machinelearningmastery.com/information-gain-and-mutual-information/

Information Gain and Mutual Information for Machine Learning

Information gain calculates the reduction in entropy or surprise from transforming a dataset in some way. It is commonly used in the construction of decision trees from a training dataset, by evaluating the information gain for each variable, and selecting

machinelearningmastery.com

728x90

LIST