KL Divergence(쿨백 라이블러 발산), Jensen-Shannon divergence

250x250

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

데이터과학 삼학년

KL Divergence(쿨백 라이블러 발산), Jensen-Shannon divergence 본문

Statistical Learning

KL Divergence(쿨백 라이블러 발산), Jensen-Shannon divergence

Dan-k 2020. 2. 13. 11:31

Kullback–Leibler divergence 은 두 확률간의 분포의 차이정도를 정량적으로 측정하는 방법이다.

어떤 이상적인 분포에 대해, 그 분포를 근사하는 다른 분포를 사용해 샘플링을 한다면 발생할 수 있는 정보 엔트로피 차이를 계산한다. 상대 엔트로피(relative entropy), 정보 획득량(information gain), 인포메이션 다이버전스(information divergence)라고도 한다.

KL Divergence를 구하는 방법은 두확률간의 cross-entropy에서 자신의 entropy를 빼는 방법이다. 엔트로피는 정보이론에서 정보량을 의미하므로, KL Divergence는 정보량의 차이를 나타낸다고 볼 수 있다.

즉, KL-Divergence의 값이 작을 수록 두분포는 유사하다.

식은 아래와 같다.

KL(p∥q) = H(p,q)− H(p)

KL Divergence의 특징을 알아보면

(non-symmetric).
KL(p|q)=0 if and only if p=q.

그렇다면 KL-Divergence를 언제 사용할까?

Cross entropy는 negative log likelihood와 같습니다. 그래서 cross entropy를 minimize하는 것이 log likelihood를 maximize하는 것과 같습니다. 그리고 확률분포 p,qp,q에 대한 cross entropy는 H(p)+KL(p|q) 이므로 KL-divergence를 minimize하는 것 또한 결국 log likelihood를 maximize하는 것과 같습니다.

KL-Divergence python code

def kl_divergence(p, q):
    return np.sum(np.where(p != 0, p * np.log(p / q), 0))

출처 : https://towardsdatascience.com/kl-divergence-python-example-b87069e4b810

출처 : https://icim.nims.re.kr/post/easyMath/550

Entropy, Cross-entropy, KL Divergence | 알기 쉬운 산업수학 | 산업수학혁신센터

icim.nims.re.kr

출처 : https://hyunw.kim/blog/2017/10/27/KL_divergence.html

초보를 위한 정보이론 안내서 - KL divergence 쉽게 보기

사실 KL divergence는 전혀 낯선 개념이 아니라 우리가 알고 있는 내용에 이미 들어있는 개념입니다. 두 확률분포 간의 차이를 나타내는 개념인 KL divergence가 어디서 나온 것인지 먼저 파악하고, 이에 대한 몇 가지 특성들을 쉬운 말로 짚어봅니다.

hyunw.kim

엔트로피 관련 : https://reniew.github.io/17/

정보이론 : 엔트로피, KL-Divergence

An Ed edition

reniew.github.io

728x90

LIST

'Statistical Learning' 카테고리의 다른 글

[ISLR] Resampling Methods (1)	2020.03.03
[ISLR] Classification (0)	2020.02.18
[ISLR] Linear Regression (0)	2020.02.10
[기초통계] t-statistic, p-value, F-statistic (1)	2020.02.10
[기초통계] 잔차와 오차 (0)	2020.02.05

'Statistical Learning' Related Articles

Comments

데이터과학 삼학년

KL Divergence(쿨백 라이블러 발산), Jensen-Shannon divergence 본문

KL Divergence(쿨백 라이블러 발산), Jensen-Shannon divergence

'Statistical Learning' 카테고리의 다른 글

티스토리툴바