# NIUHE

## 线性可分支持向量机

### 硬间隔最大化

$φ(x)$ 是某个确定的特征空间转换函数，它的作用是将 $x$ 映射到(更高的)维度。最简单直接的：$\Phi(x) = x$。求解分离超平面问题可以等价为求解相应的凸二次规划问题

$\arg \max_{w, b}\{\frac{1}{||w||}\min_i[y_i \cdot (w^T\cdot \Phi(x_i) + b)]\}$

$\arg \max_{w,b}\frac{1}{||w||}$ $s.t. \ y_i(w^T\cdot \Phi(x_i) + b) \geqslant 1, \ \ i = 1, 2, ... n$

$\arg \min_{w,b}\frac{1}{2}||w||^2$ $s.t. \ y_i(w^T\cdot \Phi(x_i) + b) -1 \geqslant 0, \ \ i = 1, 2, ... n$

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints.

Lagrange函数为：

$L(w,b,\alpha) = \frac{1}{2}||w||^2- \sum_{i=1}^n\alpha_i(y_i(w^T\cdot \Phi(x_i) + b) - 1)$

$\max_\alpha\min_{w,b}L(w, b, \alpha)$

$\frac{\partial L}{\partial w} = 0 \Rightarrow w = \sum_{i=1}^n\alpha_i y_i \Phi(x_i)$ $\frac{\partial L}{\partial b} = 0 \Rightarrow \sum_{i=1}^n\alpha_i y_i = 0$

$\begin{array}{lcl} L(w, b, \alpha) = \frac{1}{2}||w||^2- \sum_{i=1}^n\alpha_i(y_i(w^T\cdot \Phi(x_i) + b) - 1) \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{2}w^Tw - w^T\sum_{i=1}^n\alpha_iy_i\Phi(x_i) - b\sum_{i=1}^n\alpha_iy_i + \sum_{i=1}^n\alpha_i \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{2}w^T\sum_{i=1}^n\alpha_i y_i \Phi(x_i) - w^T\sum_{i=1}^n\alpha_iy_i\Phi(x_i) - b \cdot 0 + \sum_{i=1}^n\alpha_i \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ = \sum_{i=1}^n\alpha_i - \frac{1}{2}(\sum_{i=1}^n\alpha_i y_i \Phi(x_i))^T\sum_{i=1}^n\alpha_i y_i \Phi(x_i) \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ = \sum_{i=1}^n\alpha_i -\frac{1}{2}\sum_{i,j=1}^n\alpha_i\alpha_jy_iy_j\Phi(x_i)\Phi(x_j) \end{array}$

$\max_\alpha \sum_{i=1}^n\alpha_i - \frac{1}{2}\sum_{i,j=1}^n\alpha_i\alpha_jy_iy_j\Phi(x_i)\Phi(x_j)$ $s.t. \sum_{i=1}^n \alpha_iy_i = 0, \alpha_i \geqslant 0, i = 1, 2, ... n$

$\min_\alpha \frac{1}{2}\sum_{i,j=1}^n\alpha_i\alpha_jy_iy_j\Phi(x_i)\Phi(x_j) - \sum_{i=1}^n\alpha_i$ $s.t. \sum_{i=1}^n \alpha_iy_i = 0, \alpha_i \geqslant 0, i = 1, 2, ... n$

$w^* = \sum_{i=1}^n\alpha_i^*y_i\Phi(x_i)$ $b^* = y_i - \sum_{i=1}^n\alpha_i^*y_i\Phi(x_i)\cdot \Phi(x_j)$ $w^*\Phi(x) + b^* = 0$

$f(x) = sign(w^*\Phi(x) + b^*)$

## 线性支持向量机

### 软间隔最大化

$\min_{w,b, \xi}\frac{1}{2}||w||^2 + C\sum_{i=1}^n\xi_i$ $s.t. y_i(w\cdot x_i + b) \geqslant 1 - \xi_i, \ \xi_i \geqslant 0, \ i = 1,2, ... n$

$w, b, \xi$ 求偏导并令其等于 $0$ 得：

$\frac{\partial L}{\partial w} = 0 \Rightarrow \sum_{i=1}^n\alpha_iy_i\Phi(x_i)$ $\frac{\partial L}{\partial b} = 0 \Rightarrow \sum_{i=1}^n\alpha_i y_i = 0$ $\frac{\partial L}{\partial \xi} = 0 \Rightarrow C - \alpha_i - \mu_i = 0$

$\min_{w,b,\xi}L(w,b,\xi,\alpha,\mu) = -\frac{1}{2}\sum_{i=1}^n\sum_{j=1}^n\alpha_i\alpha_jy_iy_j(\Phi(x_i)\cdot \Phi(x_j)) + \sum_{i=1}^n\alpha_i$

$\max_\alpha -\frac{1}{2}\sum_{i=1}^n\sum_{j=1}^n\alpha_i\alpha_jy_iy_j(\Phi(x_i)\cdot \Phi(x_j)) + \sum_{i=1}^n\alpha_i$ $s.t. \sum_{i=0}^n\alpha_iy_i = 0, \ C - \alpha_i - \mu_i = 0, \ \alpha_i \geqslant 0, \mu_i \geqslant 0, i = 1,2...n$

$\min_\alpha \sum_{i=1}^n\alpha_i-\frac{1}{2}\sum_{i=1}^n\sum_{j=1}^n\alpha_i\alpha_jy_iy_j(\Phi(x_i)\cdot \Phi(x_j))$ $s.t. \sum_{i=0}^n\alpha_iy_i = 0, \ 0 \leqslant \alpha_i \leqslant C,\ i = 1,2...n$

$w^* = \sum_{i=1}^n\alpha_i^*y_i\Phi(x_i)$ $b^* = \frac{\max_{i:y_i = -1}w^*\Phi(x_i)+\min_{i:y_i = 1}w^*\Phi(x_i)}{2}$

$f(x) = sign(w^*\Phi(x) + b^*)$

## 非线性支持向量机

### 核函数

RBF

$e^x$ 的Taylor展开：$e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + ... + \frac{x^n}{n!} + R_n$

$\begin{array}{lcl} \kappa(x_1,x_2) = e^{-\frac{||x_1 - x_2||^2}{2\sigma^2}} = e^{-\frac{(x_1 - x_2)^2}{2\sigma^2}} \\ \ \ \ \ \ \ \ \ \ \ \ \ \ = e^{-\frac{x_1^2 - 2x_1x_2 + x_2^2}{2\sigma^2}} \\ \ \ \ \ \ \ \ \ \ \ \ \ \ = e^{-\frac{x_1^2 + x_2^2}{2\sigma^2}} \cdot e^{\frac{x_1x_2}{\sigma^2}} \\ \ \ \ \ \ \ \ \ \ \ \ \ \ = e^{-\frac{x_1^2 + x_2^2}{2\sigma^2}} \cdot(1 + \frac{1}{\sigma^2}\cdot\frac{x_1x_2}{1!} + (\frac{1}{\sigma^2})^2\cdot\frac{(x_1x_2)^2}{2!} + ... + (\frac{1}{\sigma^2})^n\cdot\frac{(x_1x_2)^n}{n!} + ...) \\ \ \ \ \ \ \ \ \ \ \ \ \ \ = e^{-\frac{x_1^2 + x_2^2}{2\sigma^2}} \cdot (1\cdot1 + \frac{1}{1!}\frac{x_1}{\sigma}\frac{x_2}{\sigma} + \frac{1}{2!}\frac{x_1^2}{\sigma^2}\frac{x_2^2}{\sigma^2} + ... + \frac{1}{n!}\frac{x_1^n}{\sigma^n}\frac{x_2^n}{\sigma^n}+...) \\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \Phi(x_1)^T\cdot\Phi(x_2) \end{array}$

$\Phi(x)$ 结果可见高斯核函数实际把特征映射到了无穷维，然后由SVM选出几个最优维度进行分类。

## SMO

SVM中系数的求解：Sequential Minimal Optimization

## 总结思考

• SVM可以用来划分多类别吗?
• 答案是肯定的。
• 直接多分类
• 1 vs all
• SVM和Logistic回归的比较
• 经典的SVM，直接输出类别，不给出后验概率
• Logistic回归，会给出属于哪个类别的后验概率
• 二者目标函数的异同
• SVM也可以用于回归问题：SVR

## 参考文献

• 李航，统计学习方法，清华大学出版社，2012
• Charlie Frogner. Support Vector Machines. 2011
• Corinana Cortes, Vladimir Vapnik. Support-Vector Networks. Machine Learning, 20, 273-297, 1995
• John C. Platt. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. 1998
• Andrew W. Moore. Support Vector Machines, 2001