目录
文章目录
-
- Jensen's inequality
- 讲解KL散度(又名relative entropy)
- mutual information
Jensen’s inequality
- f ( ∫ x p ( x ) d x ) ⩽ ∫ f ( x ) p ( x ) d x f(\int\mathrm{x}p(x)dx)\leqslant\int\mathbb{f}(x)p(x)dx f(∫xp(x)dx)⩽∫f(x)p(x)dx,根据 f ( E ( x ) ) ⩽ E ( f ( x ) ) f(E(x))\leqslant\mathbb{E}(f(x)) f(E(x))⩽E(f(x))Jensen’s inequality推。
- K L ( p ∥ q ) = − ∫ p ( x ) ln { q ( x ) p ( x ) } d x ⩾ − ln ∫ q ( x ) d x = 0 \mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x} \geqslant-\ln \int q(\mathbf{x}) \mathrm{d} \mathbf{x}=0 KL(p∥q)=−∫p(x)ln{p(x)q(x)}dx⩾−ln∫q(x)dx=0,只有当 p ( x ) p(x) p(x), q ( x ) q(x) q(x)相等时等号成立。
讲解KL散度(又名relative entropy)
- 定义 K L ( p ∥ q ) = − ∫ p ( x ) ln { q ( x ) p ( x ) } d x \mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x} KL(p∥q)=−∫p(x)ln{p(x)q(x)}dx
- − l n x -lnx −lnx是严格的凸函数,由Jensen’s inequality有 K L ( p ∥ q ) = − ∫ p ( x ) ln { q ( x ) p ( x ) } d x ⩾ − ln ∫ q ( x ) d x = 0 \mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x} \geqslant-\ln \int q(\mathbf{x}) \mathrm{d} \mathbf{x}=0 KL(p∥q)=−∫p(x)ln{p(x)q(x)}dx⩾−ln∫q(x)dx=0
- 在实际应用中 K L ( p ∥ q ) ≃ ∑ n = 1 N { − ln q ( x n ∣ θ ) + ln p ( x n ) } \mathrm{KL}(p \| q) \simeq \sum_{n=1}^{N}\left\{-\ln q\left(\mathbf{x}_{n} | \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_{n}\right)\right\} KL(p∥q)≃∑n=1N{−lnq(xn∣θ)+lnp(xn)}
- 注释:对于前面KL定义可知用的样本点服从 p ( x ) p(x) p(x),故原来积分可等于上式,例如 E ( x ) = ∫ x f ( x ) d x ≃ 1 N ∑ f ( x i ) E(x)=\int\mathrm{x}f(x)dx\simeq\frac{1}{N}\sum\mathrm{f}(x_{i}) E(x)=∫xf(x)dx≃N1∑f(xi),重要性采样等方法都用到这个方法。
mutual information
1.如果数据集变量x与y不独立,就考虑 p ( x ) p ( y ) p(x)p(y) p(x)p(y)去近似,就可得到mutual information:
I [ x , y ] ≡ K L ( p ( x , y ) ∥ p ( x ) p ( y ) ) = − ∬ p ( x , y ) ln ( p ( x ) p ( y ) p ( x , y ) ) d x d y \begin{aligned} \mathrm{I}[\mathbf{x}, \mathbf{y}] & \equiv \mathrm{KL}(p(\mathbf{x},\mathbf{y})\|p(\mathbf{x})p(\mathbf{y})) \\ &=-\iint p(\mathbf{x}, \mathbf{y})\ln\left(\frac{p(\mathbf{x}) p(\mathbf{y})}{p(\mathbf{x}, \mathbf{y})}\right) \mathrm{d} \mathbf{x} \mathrm{d} \mathbf{y} \end{aligned} I[x,y]≡KL(p(x,y)∥p(x)p(y))=−∬p(x,y)ln(p(x,y)p(x)p(y))dxdy
2.利用概率的和法则和乘积法则,可以得出互信息与条件熵的关系:
I [ x , y ] = H [ x ] − H [ x ∣ y ] = H [ y ] − H [ y ∣ x ] \mathrm{I}[\mathbf{x}, \mathbf{y}]=\mathrm{H}[\mathbf{x}]-\mathrm{H}[\mathbf{x} | \mathbf{y}]=\mathrm{H}[\mathbf{y}]-\mathrm{H}[\mathbf{y} | \mathbf{x}] I[x,y]=H[x]−H[x∣y]=H[y]−H[y∣x]