# NIUHE

### 深入理解EM算法

In statistics, an expectation–maximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.

## 直观理解高斯混合模型(GMM)

### 最大似然估计

ps : 熟悉最大似然估计的同学可以跳过这部分 :)

$p$ 求偏导并令其等于零，求对数似然函数最大时 $p$ 的值：

$\begin{array}{lcr} \frac{\partial f(n\ |\ p)}{\partial p} = \frac{n}{p} - \frac{N-n}{1-p} = 0 \\ \Rightarrow p = \frac{n}{N} \end{array}$

$X_i$ 的样本值 $x_i$ 带入, 得到似然函数 :

$L(x) = \prod_{i=1}^n\frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{(x_i-\mu)^2}{2\sigma^2})$

$\begin{array}{lcr} l(x) = \log \prod_{i=1}^n\frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{(x_i-\mu)^2}{2\sigma^2}) \\ \ \ \ \ \ = \sum_{i=1}^n\log\frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{(x_i-\mu)^2}{2\sigma^2}) \\ \ \ \ \ \ = \sum_{i=1}^n \log\frac{1}{\sqrt{2\pi}\sigma} + \sum_{i=1}^n-\frac{(x_i-\mu)^2}{2\sigma^2} \\ \ \ \ \ \ = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu)^2 \end{array}$

$\begin{cases} \mu = \frac{1}{n}\sum_{i=1}^nx_i \\ \sigma^2 = \frac{1}{n}\sum_{i=1}^n(x_i - \mu)^2 \end{cases}$

### 问题 : 随机变量无法直接(完全)观察到

$l_{\pi,\mu,\Sigma}(x) = \sum_{i=1}^N\log(\sum_{k=1}^K\pi_kN(x_i\ |\ \mu_k, \Sigma_k))$

$\gamma(i, k) = \frac{\pi_kN(x_i\ |\ \mu_k, \Sigma_k)}{\sum_{j=1}^K\pi_jN(x_i\ |\ \mu_j, \Sigma_j)}$

$\begin{cases} N_k = \sum_{i=1}^N\gamma(i, k) \\ \mu_k = \frac{1}{N_k}\sum_{i=1}^N\gamma(i,k)x_i \\ \Sigma_k = \frac{1}{N_k}\sum_{i=1}^N\gamma(i,k)(x_i - \mu_k)(x_i-\mu_k)^T \\ \pi_k = \frac{N_k}{N} = \frac{1}{N}\sum_{i=1}^N\gamma(i, k) \end{cases}$

## EM算法

EM算法的提出

$l(\theta) = \sum_{i=1}^m\log p(x\ ;\theta) = \sum_{i=1}^m\log \sum_z p(x,z;\theta)$

Jensen不等式

$f$ 是凸函数，则： $f(\theta x + (1 - \theta) y) \leqslant \theta f(x) + (1-\theta)f(y)$

$\theta_i$ 也可以看作是 $x_i$ 发生的概率，这样的话就得到： $f(E(x)) \leqslant E(f(x))$

$Q_i$$z$ 的某一个分布，$Qi≥0$，应用Jesen不等式有:

$\begin{array}{lcl} l(\theta) = \sum_{i=1}^m\log \sum_z p(x,z;\theta) \\ \ \ \ \ \ \ = \sum_{i=1}^m\log \sum_{z^{(i)}} p(x^{(i)},z^{(i)};\theta) \\ \ \ \ \ \ \ = \sum_{i=1}^m\log \sum_{z^{(i)}} Q_i(z^{(i)})\frac{p(x^{(i)},z^{(i)};\theta)}{Q_i(z^{(i)})} \\ \ \ \ \ \ \ \geqslant \sum_{i=1}^m \sum_{z^{(i)}}Q_i(z^{(i)})\log\frac{p(x^{(i)},z^{(i)};\theta)}{Q_i(z^{(i)})} \end{array}$

$\frac{p(x^{(i)},z^{(i)};\theta)}{Q_i(z^{(i)})} = C$

$\begin{array}{lcl} Q_i(z^{(i)}) = \frac{p(x^{(i)},z^{(i)};\theta)}{\sum_zp(x^{(i)},z^{(i)};\theta)} \\ \ \ \ \ \ \ \ \ \ \ \ = \frac{p(x^{(i)},z^{(i)};\theta)}{p(x^{(i)};\theta)} \\ \ \ \ \ \ \ \ \ \ \ \ = p(z^{(i)}\ |\ x^{(i)}; \theta) \end{array}$

EM算法整体框架

E step，求每组数据的期望（实际是条件概率），然后M setp，最大化期望，之后再翻回去更新期望，如此往复直至收敛，这就是所谓期望最大化算法（EM）

## 从理论公式推导GMM

E-step

$w_j^{(i)} = Q_i(z^{(i)} = j) = P(z^{(i)} = j\ |\ x^{(i)}; \phi, \mu, \Sigma)$

M-step

$\begin{array}{lcl} \ \ \ \sum_{i=1}^m \sum_{z^{(i)}}Q_i(z^{(i)})\log\frac{p(x^{(i)},z^{(i)};\phi, \mu, \Sigma)}{Q_i(z^{(i)})} \\ = \sum_{i=1}^m \sum_{j=1}^k Q_i(z^{(i)} = j)\log\frac{p(x^{(i)}|z^{(i)}=j; \mu, \Sigma)p(z^{(i)}=j; \phi)}{Q_i(z^{(i)} = j)} \\ = \sum_{i=1}^m \sum_{j=1}^k w_j^{(i)}\log \frac{\frac{1}{(2\pi)^{2/n}|\Sigma_j|^{1/2}}\exp(-\frac{1}{2}(x^{(i)}-\mu_j)^T\Sigma_j^{-1}(x^{(i)}-\mu_j))\cdot \phi_j}{w_j^{(i)}} \end{array}$

$\begin{array}{lcl} \ \ \ \bigtriangledown_{\mu_l} \sum_{i=1}^m \sum_{j=1}^k w_j^{(i)}\log \frac{\frac{1}{(2\pi)^{2/n}|\Sigma_j|^{1/2}}\exp(-\frac{1}{2}(x^{(i)}-\mu_j)^T\Sigma_j^{-1}(x^{(i)}-\mu_j))\cdot \phi_j}{w_j^{(i)}} \\ = \bigtriangledown_{\mu_l} \sum_{i=1}^m \sum_{j=1}^k w_j^{(i)} (-\frac{1}{2}(x^{(i)}-\mu_j)^T\Sigma_j^{-1}(x^{(i)}-\mu_j)) \\ = \sum_{i=1}^m w_l^{(i)}(\Sigma_l^{-1}x^{(i)}-\Sigma_l^{-1}\mu_l) \end{array}$

$\mu_l = \frac{\sum_{i=1}^mw^{(i)}_lx^{(i)}}{\sum_{i=1}^mw^{(i)}_l}$

$\Sigma_j$ 求偏导并令其等于零可以得出：

$\Sigma_j = \frac{\sum_{i=1}^mw^{(i)}_j(x^{(i)}-\mu_j)(x^{(i)}-\mu_j)^T}{\sum_{i=1}^mw^{(i)}_j}$

$\begin{array}{lcl} \ \ \ \bigtriangledown_{\phi_j} \sum_{i=1}^m \sum_{j=1}^k w_j^{(i)}\log \frac{\frac{1}{(2\pi)^{2/n}|\Sigma_j|^{1/2}}\exp(-\frac{1}{2}(x^{(i)}-\mu_j)^T\Sigma_j^{-1}(x^{(i)}-\mu_j))\cdot \phi_j}{w_j^{(i)}} \\ = \bigtriangledown_{\phi_j} \sum_{i=1}^m \sum_{j=1}^k w_j^{(i)}\log \phi_j \end{array}$

$L(\phi, \beta) = \sum_{i=1}^m \sum_{j=1}^k w_j^{(i)}\log \phi_j + \beta(\sum_{j=1}^k\phi_j - 1)$

$\frac{\partial}{\partial \phi_j}L(\phi,\beta) = \sum_{i=1}^m\frac{w^{(i)}_j}{\phi_j} + \beta$

$\begin{array}{lcl} \ \ \ \ \sum_{i=1}^m\frac{w^{(i)}_j}{\phi_j} + \beta = 0 \\ \Rightarrow \sum_{i=1}^m w^{(i)}_j + \beta\cdot \phi_j = 0 \\ \Rightarrow \sum_{j=1}^k\sum_{i=1}^m w^{(i)}_j + \sum_{i=1}^k\beta\cdot \phi_j = 0 \\ \Rightarrow \sum_{i=1}^m\sum_{j=1}^k w^{(i)}_j + \beta = 0 \\ \Rightarrow \sum_{i=1}^m1 + \beta = 0 \\ \Rightarrow \beta = -m \\ \Rightarrow \sum_{i=1}^m\frac{w^{(i)}_j}{\phi_j} -m = 0 \\ \Rightarrow \phi_j = \frac{1}{m}\sum_{i=1}^mw^{(i)}_j \end{array}$

$\begin{cases} N_k = \sum_{i=1}^N\gamma(i, k) \\ \mu_k = \frac{1}{N_k}\sum_{i=1}^N\gamma(i,k)x_i \\ \Sigma_k = \frac{1}{N_k}\sum_{i=1}^N\gamma(i,k)(x_i - \mu_k)(x_i-\mu_k)^T \\ \pi_k = \frac{N_k}{N} = \frac{1}{N}\sum_{i=1}^N\gamma(i, k) \end{cases}$