# NIUHE

## 使用极大似然估计推导损失函数

$h_\theta(x) = \sum^n_{i=0}\theta_ix_i = \theta^Tx$

$\epsilon^{(i)}$ 的概率密度函数为： $P(\epsilon^{(i)}) = \frac{1}{\sqrt{2\pi\sigma}}e^{\frac{-(\epsilon^{(i)})^2}{2\sigma^2}}$

$P(y^{(i)}|x^{(i)};\theta) = \frac{1}{\sqrt{2\pi\sigma}}\exp(\frac{-(y^{(i)} - \theta^Tx^{(i)})^2}{2\sigma^2})$

$\because$ 所有样本发生的概率独立同分布 $\therefore$ 似然函数为所有样本发生概率的乘积，即：

$\begin{array}{lcl} L(\theta) = \prod^{m}_{i=1} P(y^{(i)}|x^{(i)};\theta) \\ = \prod^{m}_{i=1} \frac{1}{\sqrt{2\pi\sigma}}\exp(\frac{-(y^{(i)} - \theta^Tx^{(i)})^2}{2\sigma^2}) \end{array}$

$L(\theta)$ 取对数得对数似然函数：

$\begin{array}{lcl} l(\theta) = \log L(\theta) \\ = \log \prod^{m}_{i=1} \frac{1}{\sqrt{2\pi\sigma}}\exp(\frac{-(y^{(i)} - \theta^Tx^{(i)})^2}{2\sigma^2}) \\ = m\log \frac{1}{\sqrt{2\pi\sigma}} + \sum^m_{i=1} \frac{-(y^{(i)} - \theta^Tx^{(i)})^2}{2\sigma^2} \\ = m\log \frac{1}{\sqrt{2\pi\sigma}} - \frac{1}{\sigma^2} \cdot \frac{1}{2} \sum^m_{i=1} (y^{(i)} - \theta^Tx^{(i)})^2 \end{array}$

## 求解最小二乘意义下参数最优解

$J(\theta) = \frac{1}{2}\sum^{m}_{i=1}(y^{(i)} - \theta^Tx^{(i)})^2 = \frac{1}{2}(X\theta - y)^T(X\theta - y)$

$\begin{array}{lcr} \nabla_\theta J(\theta) = \frac{\partial}{\partial \theta}(\frac{1}{2}(X\theta - y)^T(X\theta - y)) \\ = \frac{\partial}{\partial \theta}(\frac{1}{2}(\theta^TX^T - y^T)(X\theta - y)) \end{array} \\ = \frac{\partial}{\partial \theta} (\frac{1}{2}(\theta^TX^TX\theta - \theta^TX^Ty - y^TX\theta + y^Ty) ) \\ = \frac{1}{2}(2X^TX\theta - X^Ty - (y^TX)^T) \\ = X^TX\theta - X^Ty$

$\nabla_\theta J(\theta) = 0$，即 $X^TX\theta - X^Ty = 0$，解得： $\theta = (X^TX)^{-1}X^Ty$

$X^TX$ 不可逆或防止过拟合，增加 $\lambda$ 扰动： $\theta = (X^TX + \lambda I)^{-1}X^Ty$

## Logistic 回归的损失函数及参数估计

Sigmoid函数

• $g(z) =$

Sigmoid函数的导数

$\begin{array}{lcr} g'(z) = (\frac{1}{1 + e^{-z}})' \\ = \frac{e^{-z}}{(1 + e^{-z})^2} \\ = \frac{1}{1 + e^{-z}} - \frac{1}{(1 + e^{-z})^2} \\ = g(z)(1 - g(z)) \end{array}$

• $h_\theta(x) = g(\theta^TX) = \frac{1}{1 + e^{-\theta^TX}}$

$P(y = 1 | x; \theta) = h_\theta(x)$ $P(y = 0 | x; \theta) = 1- h_\theta(x)$

$P(y^{(i)} | x^{(i)}; \theta) = h_\theta(x^{(i)})^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1 - y^{(i)}}$

$\because$ 所有样本发生的概率独立同分布 $\therefore$ 似然函数为所有样本发生概率的乘积，即：

$\begin{array}{lcr} L(\theta) = \prod^{m}_{i=1} P(y^{(i)}|x^{(i)};\theta) \\ = \prod^{m}_{i=1} h_\theta(x^{(i)})^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1 - y^{(i)}} \end{array}$

$\begin{array}{lcr} l(\theta) = \log L(\theta) \\ = \log \prod^{m}_{i=1} h_\theta(x^{(i)})^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1 - y^{(i)}} \\ = \sum^m_{i = 1} y^{(i)} \log h_\theta(x^{(i)}) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)})) \\ = \sum^m_{i = 1} y^{(i)} \log(\frac{1}{1 + e^{-\theta x^{(i)}}}) + (1 - y^{(i)})\log(\frac{e^{-\theta x^{(i)}}}{1 + e^{-\theta x^{(i)}}}) \\ = \sum^m_{i = 1} y^{(i)} \log(\frac{1}{1 + e^{-\theta x^{(i)}}}) + (1 - y^{(i)})\log(\frac{1}{1 + e^{\theta x^{(i)}}}) \\ \end{array}$

$J(\theta) = \sum^m_{i = 1} \log(1 + e^{-y^{(i)}\theta x^{(i)}})$

$\begin{array}{lcr} \frac{\partial}{\partial \theta_j}l(\theta) = \frac{\partial}{\partial \theta_j}\sum^m_{i = 1} y^{(i)} \log h_\theta(x^{(i)}) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)})) \\ = \sum^m_{i = 1} y^{(i)}\frac{1}{h_\theta(x^{(i)})}\frac{\partial h_\theta(x^{(i)})}{\partial \theta_j} + (1 - y^{(i)}) \frac{1}{1 - h_\theta(x^{(i)})}\frac{-\partial h_\theta(x^{(i)})}{\partial \theta_j} \\ = \sum^m_{i = 1} y^{(i)}(1 - h_\theta(x^{(i)}))x_j^{(i)} - (1 - y^{(i)}) h_\theta(x^{(i)})x_j^{(i)} \\ = \sum^m_{i = 1} (y^{(i)} - h_\theta(x^{(i)}))x_j^{(i)} \end{array}$

$\theta_j := \theta_j + \alpha(y^{(i)} - h_\theta(x^{(i)}))x_j^{(i)}$